More mode 7 | Contents | Assembly |
The other page on text described how you could get text on backgrounds and objects. It worked, but there were several limitations. For instance, it was limited to 8×8 fonts, didn't support all video modes and had no formatted text capabilities.
Tonc's Text Engine (TTE) remedies many of these shortcomings. In this chapter I'll describe the goals and basic design of the system and some of the implementation details. In particular, I'll describe how to build writers for use of the different kinds of surfaces. In some cases, I'll optimize the living hell out of them because it is possible for a glyph renderer to take multiple scanlines for a single character if you don't pay attention. And yes, this will be done in assembly.
I'll also show how you can add some basic scripting to change cursor positions, colors and even fonts dynamically. A few years ago, Wintermute changed the standard C library in devkitARM to make the stdio routines accessible for GBA and NDS. I'll also show how you can make use of this.
And, of course, there will be demos. Oh, will there be demos. There are about 10 of them in fact, so I'm going to do things a little bit differently than before: there will be one project containing a menu with all the examples. Not all examples will be shown here because that'd just be too much.
Lastly, it is expected that by now you have a decent knowledge of GBA programming, so I'm going to keep the amount of GBA-specific exposition to a minimum. When you see functions used that haven't been covered already, turn to GBATek, the project's code or tonclib's code for details.
The following list has the things I most wanted in TTE:
m3_puts()
,
se_puts()
, obj_puts()
and such. This
worked, but it meant you had to use something different for the
different modes. In TTE, there are different initializers for
the different modes to set up the system, and a single string
writer tte_write()
that just works.
printf()
support. For rather obvious reasons.
All the relevant information for TTE is kept gathered in three structs:
a text context, TTC
;
a font description, TFont
;
and a graphic surface description, TSurface
.
The TTC
struct contains the main parameters for the
engine: information about the surface were rendering to, cursor
positions, font information, color attributes and a few other things.
It also contains two callbacks for drawing and erasing glyphs.
The TFont
struct has a pointer to the glyph data,
glyph/cell dimensions and a few other things. There are also pointers
to width and height tables to allow variable width and height fonts.
I've hacked a TFont
creator into
usenti a while
back so that I could easily create these things from standard fonts,
but you can also make your own from scratch.
The TSurface
struct actually has nothing to do with text.
Instead, it's a struct describing the kind of surface we're rendering
on. This can be bitmaps, tiles, tilemaps or whatever. Tonclib has
basic pixel, line and rectangle routines for dealing with these
surfaces, so I might as well use them.
//# From tonc_tte.h : main TTE types. typedef struct TFont { const void *data; //!< Character data. const u8 *widths; //!< Width table for variable width font. const u8 *heights; //!< Height table for variable height font (mostly unused). u16 charOffset; //!< Character offset u16 charCount; //!< Number of characters in font. u8 charW; //!< Character width (fwf). u8 charH; //!< Character height.(fhf). u8 cellW; //!< Glyph cell width. u8 cellH; //!< Glyph cell height. u16 cellSize; //!< Cell-size (bytes). u8 bpp; //!< Font bitdepth; u8 extra; //!< Padding. Free to use. } TFont; //! TTE context struct. typedef struct TTC { // Members for renderers TSurface dst; //!< Destination surface. s16 cursorX; //!< Cursor X-coord. s16 cursorY; //!< Cursor Y-coord. TFont *font; //!< Current font. u8 *charLut; //!< Character mapping lut, if any. u16 cattr[4]; //!< ink, shadow, paper and special color attributes. // Higher-up members u16 reserved; u16 ctrl; //!< BG control flags. u16 marginLeft; u16 marginTop; u16 marginRight; u16 marginBottom; s16 savedX; s16 savedY; // Callbacks and table pointers fnDrawg drawgProc; //!< Glyph render procedure. fnErase eraseProc; //!< Text eraser procedure. const TFont **fontTable; //!< Pointer to font table for const char **stringTable; //!< Pointer to string table for } TTC;
//# Supporting types //! Glyph render function format. typedef void (*fnDrawg)(int); //! Erase rectangle function format. typedef void (*fnErase)(int left, int top, int right, int bottom); typedef struct TSurface { u8 *data; //!< Surface data pointer. u32 pitch; //!< Scanline pitch in bytes. u16 width; //!< Image width in pixels. u16 height; //!< Image width in pixels. u8 bpp; //!< Bits per pixel. u8 type; //!< Surface type. u16 palSize; //!< Number of colors. u16 *palData; //!< Pointer to palette. } TSurface;
Fig 22.1 shows a character sheet
that TFont
can use. The sheet is a matrix of
cells and each cell contains a character. The cellW/H
members are the dimensions of these cells; cellSize
is the
number of bytes per cell.
Each cell has one glyph, but the actual glyphs can be smaller
than the cells (white vs magenta parts). This does waste a bit of memory,
but it also has several benefits. One of the benefits is that
you can use cellSize
to quickly find the address of any
given glyph. Second, because I want by fonts to be usable for both
bitmaps and tiles, my glyph boxes would have to be multiples of
8 anyway. Additionally, this particular font will be 1bpp, meaning that
even with the wasted parts I'll still have a very low memory footprint
(3.5kB).
For fixed-width or fixed-height fonts, members charW
and charH
denote the actual character width and height.
For fonts of variable widths, the widths
member
points to the a byte-array containing the widths of the glyphs and
something similar is true for the heights
.
charOffset
is the (ASCII) character the data starts at.
Font sheets often start at a space (' '), so this tends to be 32.
charCount
is the number of characters and can be used
if you need to copy the whole sheet to VRAM (like in the case of
tile-mapping).
Please note that how the data in a TFont
is used depends
almost entirely on the glyph renderer. Most renderers that come with
tonclib expect this format:
cellSize
bytes between each glyph. This is similar to
how 1D object work with one important difference:
There are exceptions to this, but most renderers presented here will use this format. If you want to make your own renderers, you're free to use any format for the data you think is appropriate.
The text context, TTC
, contains the most important data
of the system. Starting at the top: the surface, dst
.
This defines the surface we're rendering to. The most relevant items
there are its memory address, pitch: the number of bytes per
scanline. The pitch is a very important parameter for
rendering, more important than the width and height in fact. The
surface also has palette members, which can be used to access its
colors. Much like the TFont
members, how this data is
used largely depends on the renderer.
The members cursorX/Y
are for the current cursor position.
The margin
rectangle indicates which part of the screen
should be used for text. If the cursor exceeds the right margin, it
will be moved to the left margin and one line down. The margins are
also used for screen-clears and returning to the top of the page.
The cattr
table is something special. Its entries are
color attributes. Parameters for ink (foreground color),
shadow, paper (background color) are put here, along with a
‘special’ field which is very much context-specific. Note
that these color attributes do not necessarily represent colors. In
modes 3 and 5 they're colors, but for mode 4 and tile writers they're
color indices. There's probably a nicer name for this than
‘color attribute’, but sodomy non sapiens.
Rendering glyphs and erasing (parts of) the screen is done through
the callbacks drawgProc
and eraseProc
. The
idea is that you initialize the system with the routines appropriate
for your text format and TTE uses them to do the actual writing. I
should point out that using callbacks for rendering a single glyph
can have a significant overhead, especially for the simpler kinds of
text like tilemaps.
The state of the TTE system is kept in a TTC
variable
accessible through tte_get_context()
. All changes
to the system go through there. In some cases, it is
useful to have two sets of state and switch between them
when appropriate (like when you have two screens. Y hello thar,
NDS). For that you can use tte_set_context)
to
redirect the pointer.
TTC __tte_main_context; TTC *gp_tte_context= &__tte_main_context; //! Get the master text-system. INLINE TTC *tte_get_context() { return gp_tte_context; } //! Set the master context pointer. void tte_set_context(TTC *tc) { gp_tte_context= tc ? tc : &__tte_main_context; }
To print characters, you can use tte_putc()
and
tte_write()
.
//! Get the glyph index of character \a ch. INLINE uint tte_get_glyph_id(int ch) { TTC *tc= tte_get_context(); ch -= tc->font->charOffset; return tc->charLut ? tc->charLut[ch] : ch; } //! Get the width of glyph \a id. INLINE uint tte_get_glyph_width(uint gid) { TFont *font= tte_get_context()->font; return font->widths ? font->widths[gid] : font->charW; } //! Render a character. int tte_putc(int ch) { TTC *tc= tte_get_context(); TFont *font= tc->font; // (4) translate from character to glyph index uint gid= tte_get_glyph_id(ch); // (5) get width for cursor update int charW= tte_get_glyph_width(gid); if(tc->cursorX+charW > tc->marginRight) [[ simulate newline ]] // (6) Draw and update position tc->drawgProc(gid); tc->cursorX += charW; return charW; }
//! Render a string. /*! \param text String to parse and write. \return Number of parsed characters. */ int tte_write(const char *text) { int ch; uint gid, charW; const char *str= text; TTC *tc= tte_get_context(); while( (ch= *str++) != '\0' ) { // (1) Act according to character type switch(ch) { case '\n': [[ update cursorX/Y for newline ]]; break; case '\t': [[ update cursorX for tab ]]; break; default: // (2) more special thingies if(ch=='#' && str[0]=='{') // (2a) Command sequence { str= tte_cmd_default(str+1); break; } else if(ch=='\\' && str[0]=='#') // (2b) Escaped command ch= *str++; else if(ch>=0x80) // (2c) UTF8 character ch= utf8_decode_char(str-1, &str); // (3) draw character tte_putc(ch); } } return str - text; }
I've omitted the code for a few things here, the idea should be clear.
First, read a character. Then, check whether it's a special character
(new line, tab, formatting command) and if so, act accordingly. Because
tte_write()
supports UTF-8, we also check for that
and decode the string for a full UFT-8 character. After that's all done,
we pass the character on to tte_putc()
, which translates it
to a glyph index, draws the glyph and advances the cursor.
Note: the way described here is a method of doing things; it's not the method, because that doesn't actually exist. Several steps done here may be overkill for the kind of text you had in mind. For example, getting from the character to the glyph index is done by the font's character offset and a potential character look-up table, neither of which is strictly necessary. Likewise, wrapping at the edges may already be done in the string itself with newline characters. On the other hand, you might like more complex wrapping, text alignment, scrolling, and who knows what else. If you want these things, creating your own routine shouldn't be too difficult.
Some terms I use in TTE have a very specific meaning. Because the differences between terms can be subtle, it is important to define the term explicitly. Additionally, TTE uses several acronyms and abbreviations that need to be clarified.
ch
are characters and
gid
means glyph index. The input of the renderers is
the glyph index, and not the character.
TSurface
types.
Renderer types. Within each family there can be different renderers for different kinds of fonts and effects. For example, when rendering to an 8bpp bitmap (the bmp8 family), you can have different renderers for different font bitdepths (1bpp or 8bpp, for example) or glyph layouts (bitmapped or tiled). They can render some pixels transparently, or apply anti-aliasing. Or any combination of those. The point is there are a lot of options here.
Because
I really don't like names that span a whole line, I will use
abbreviations in the renderer's name to indicate what it does; see
table 22.2 for what they mean. For the
most part, the renderers will be for fonts with arbitrary width
and height, with a 1bpp tile-stripped glyphs, with they will
draw them transparently and re-coloring of pixels. This is
indicated by *_b1cts
.
Family | prefix | Initializer |
---|---|---|
Regular tilemap (mode 0/1) | se | void tte_init_se(int bgnr, u16 bgcnt, SCR_ENTRY se0, u32 colors, u32 bupofs, const TFont *font, fnDrawg proc); |
Affine tilemap (mode 1/2) | ase | void tte_init_ase(int bgnr, u16 bgcnt, u8 ase0, u32 colors, u32 bupofs, const TFont *font, fnDrawg proc); |
4bpp Tiles (modes 0/1/obj) | chr4(c/r) | void tte_init_chr4(c/r)(int bgnr, u16 bgcnt, u32 cattrs, u32 colors, const TFont *font, fnDrawg proc); |
8bpp bitmap (mode 4) | bmp8 | void tte_init_bmp(int vmode, const TFont *font, fnDrawg proc); |
16bpp bitmap (mode3/5) | bmp16 | void tte_init_bmp(int vmode, const TFont *font, fnDrawg proc); |
objects | obj | void tte_init_obj(OBJ_ATTR *dst, u32 attr0, u32 attr1, u32 attr2, u32 colors, u32 bupofs, const TFont *font, fnDrawg proc); |
Code | Description |
---|---|
bx | Bitdepth of source font. (b1 = 1 bpp) |
wx | Specific width (w8 = width 8) |
hx | Specific height (h8 = height 8) |
c | Re-coloring. Color attributes are applied to the pixels in some way. |
t/o | Transparent or opaque paper pixels. |
s | Glyphs are in tile-strip format. |
Lastly, a note on some of the abbreviations I use in the rendering
code. A number of terms come up again and again, and I've taken
to use a shorthand notation for these items. The basic format is
fooX
where foo
is the relevant
bitmap/surface and X
is a one-letter code for things
like width, height, data and others. Yes, the use of single-letter
names is frowned upon and I don't advocate their use in general,
but I've found that in this particular case, if used judiciously,
they have helped me read my own code.
Term | Meaning |
---|---|
fooW | Width of foo |
fooH | Height of foo |
fooB | Bitdepth of foo |
fooP | Pitch of foo |
fooD | Primary data-pointer for foo |
fooL | Secondary data-pointer for foo |
fooS | Size of foo |
fooN | Number/count of foo |
Tilemapped text is the easiest to implement, because you don't really have to render anything at all. You simply load up all the font's tiles into a charblock and place screen-entries for the actual text.
The initializer for regular tilemaps is tte_init_se()
.
It's identical to
txt_init_se()
except for the two extra parameters at the end: font
and
proc
. These represent the font to use and the renderer
that does the surface manipulation. Every initializer in TTE has
those two parameters. It's safe to pass NULL to them if you're not
sure what to use; in that case, the default option for that family
will be used.
If font
is NULL, you'll get the default font. This is
either system8Font
for fixed-width occasions or verdana9Font
(fig 22.1) when variable width is suitable.
These can also be referenced via fwf_default
and
vwf_default
, respectively.
Each family also has a default renderer, #defined as
foo_drawg_default
, where foo is the family
prefix. The default renderers are the general routines, suitable for
all character widths and heights (fixed or variable fonts). Of course,
this does mean that they will be slower than routines written to work
with a specific glyph size. This is particularly true for
tilemapped text, and for that reasons specific
_w8h8
and _w8h16
versions are available there
as well.
The initializers tend to be long and boring, so I won't waste too much space on them here. Basically, they clear out the text context, assign some sensible values to the margins and surface variables, set up the font, the renderer and the eraser. They also fill some of the palette and color attributes.
The code I'll show in this chapter will mostly be about the renderers
themselves. Below you can see the code for the default screen-entry
writer, se_drawg_s()
, and the one specific for
8×8 fonts, se_drawg_w8h8
//! Character-plot for reg BGs, any sized, vertically tiled font. void se_drawg_s(uint gid) { int ix, iy; // (1) Get main variables. TTC *tc= tte_get_context(); TFont *font= tc->font; uint charW= (font->cellW+7)/8, charH= (font->cellH+7)/8; uint x0= tc->cursorX, y0= tc->cursorY; uint dstP= tc->dst.pitch/2; u16 *dstD= (u16*)(tc->data + (y0*dstP+x0)*2); // (2) Get the base tile index. u32 se= tc->cattr[TTE_SPECIAL] + gid*charW*charH; // (3) Loop over all tiles to draw glyph. for(ix=0; ix<charW; ix++) { for(iy=0; iy<charH; iy++) dstD[iy*dstP]= se++; dstD++; } } //! Character-plot for reg BGs using an 8x8 font. void se_drawg_w8h8(uint gid) { TTC *tc= tte_get_context(); uint x0= tc->cursorX, y0= tc->cursorY; uint dstP= tc->dst.pitch/2; u16 *dstD= (u16*)(tc->data + (y0*dstP+x0)*2); dstD[0]= tc->cattr[TTE_SPECIAL] + gid; }
Let's start with the simpler one: se_drawg_w8h8()
. An
8×8 glyph on a GBA tilemap simply means write a single screen-entry
to the right place. The right place here is derived from the
cursor position and the surface data (tc->dst
). The
‘special’ color attribute is used as a modifier to the
glyph index for things like palette swapping.
Note that the routine just handles plotting the glyph. Transforming from ASCII to glyph index and repositioning the cursor is all done elsewhere.
The more generalized routine, se_drawg_s()
is a little
more complex. It still starts by getting a pointer to the glyph's
destination, dstD
, and pitch (the distance to the next
line), dstP
. All renderers start with something like this.
All renderers also retrieve the character's width and height –
unless the sizes are specified in advance. The names I use for
rendering are always the same, so you should be able to tell what
means what even when the formulas for initializing them can be a tad
icky.
Anyway, after getting the pointer and pitch, the tile-index
for the top-left of the glyph is calculated and put this into
se
. After that, we loop over the different tiles of the
glyph in both directions. Note that the order of the loop is
column-major, not row-major, because that's the way the default
fonts were ordered.
As it happens, column-major rendering tends to be more efficient
for text, because glyphs are usually higher than they are wide.
Also, for tilemap text charW
and
charH
tend to be small – often 1 or 2. This
means that it is extremely inefficient to use loops; we'll
see how inefficient in subsection
22.7.5.. Unrolling them, like
se_drawg_w8h8()
and se_drawg_w8h16()
do, gives a much better performance.
void test_tte_se4() { irq_init(NULL); irq_add(II_VBLANK, NULL); REG_DISPCNT= DCNT_MODE0 | DCNT_BG0; // --- (1) Base TTE init for tilemaps --- tte_init_se( 0, // Background number (BG 0) BG_CBB(0)|BG_SBB(31), // BG control (for REG_BGxCNT) 0, // Tile offset (special cattr) CLR_YELLOW, // Ink color 14, // BitUnpack offset (on-pixel = 15) NULL, // Default font (sys8) NULL); // Default renderer (se_drawg_s) // --- (2) Init some colors --- pal_bg_bank[1][15]= CLR_RED; pal_bg_bank[2][15]= CLR_GREEN; pal_bg_bank[3][15]= CLR_BLUE; pal_bg_bank[4][15]= CLR_WHITE; pal_bg_bank[5][15]= CLR_MAG; pal_bg_bank[4][14]= CLR_GRAY; // --- (3) Print some text --- // "Hello world in different colors" tte_write("\n Hello world! in yellow\n"); tte_write(" #{cx:0x1000}Hello world! in red\n"); tte_write(" #{cx:0x2000}Hello world! in green\n"); // Color use explained tte_set_pos(8, 64); tte_write("#{cx:0x0000}C#{cx:0x1000}o#{cx:0x2000}l"); tte_write("#{cx:0x3000}o#{cx:0x4000}r#{cx:0x5000}s"); tte_write("#{cx:0} provided by \\#{cx:#}."); // --- (4) Init for 8x16 font and print something --- GRIT_CPY(&tile_mem[0][256], cyber16Glyphs); // Load tiles tte_set_font(&cyber16Font); // Attach font tte_set_special(0x4100); // Set special to tile 256, pal 4 tte_set_drawg(se_drawg_w8h16); // Attach renderer tte_write("#{P:8,80}Also available in 8x16"); key_wait_till_hit(KEY_ANY); }
The code above demonstrates a few of the things you can do with TTE
for tilemaps. The call to tte_init_se()
initializes the
system to display text on BG 0, using charblock 0 and screenblock 31
and to use the default font and renderer.
Parameter five is the bit-unpack offset; by setting it to 14,
all the 1-valued pixels in the font move to 14+1=15, the last index in
a palette bank. I'm also setting a few other colors so that the palette
will look like fig 22.3b.
In step 3, I print some text with tte_write()
. The
different colors are done by using #{cx:num}
in
the string, which sets the special color-attribute to num. More
on these kinds of commands in section 22.7.. Since
the se
-renderers add this value to the glyph index for
the final output, it can be used for palette swapping.
Step 4 demonstrates how to load up and use a second font.
The cyber16Font
is a rendition of the 8×16 font
used in ye olde SNES game, Cybernator (see
fig 22.2). This font was exported as
4bpp data so I can just copy it into VRAM directly, but I do need to
use an offset because I want to keep the old font as well. The
charblock now has two sets of glyphs
(see fig 22.3c).
![]() Fig 22.2: Cybernator font: 8×16. |
![]() Fig 22.3a: test_tte_se4 output.
|
![]() Fig 22.3b: Palette. |
![]() Fig 22.3c: Tileset. |
In principle, all I need to do to use a different font is to select
it with tte_set_font()
, but since the tiles are
at an offset, I also need to adjust the special color attribute.
The value of 0x4100 is used here to account for the offset (0x0100)
as well as the palette-bank (0x4000). I'm also selecting a different
renderer for the occasion, although that's mostly for show here
because the default renderer can handle 8×16 fonts just as well.
After that, I just call tte_write()
again to print a new
string in using the new font.
Text for affine tilemaps works almost the same as for regular
tilemaps; you just have to remember the differences between the
two kinds of backgrounds, like map size and available bitdepth.
The functions' prototypes are the same, except that se
is replaced by ase
.
Internally, the only real difference is in what the renderers are
to output, namely bytes instead of halfwords.
And here we run into that quaint little fact
of VRAM again: you can't write single bytes to VRAM. This means that the
renderers will be a little more complicated. But only a little:
simply call a byte-plotting routine for the screen-entry
placement. Because affine maps are essentially 8bpp bitmap surfaces,
I can use the standard plotter for 8bpp bitmap surfaces:
_sbmp8_plot()
. Aside from this one difference, the
ase_
renderers are identical to the se_
counterparts.
//! Character-plot for affine BGs using an 8x8 font. void ase_drawg_w8h8(uint gid) { TTC *tc= tte_get_context(); u8 se= tc->cattr[TTE_SPECIAL] + gid; _sbmp8_plot(&tc->dst, tc->cursorX/8, tc->cursorY/8, se); } //! Character-plot for affine BGs, any sized, vertically oriented font. void ase_drawg_s(int gid) { TTC *tc= tte_get_context(); TFont *font= tc->font; uint charW= (font->cellW+7)/8, charH= (font->cellH+7)/8; uint x0= tc->cursorX/8, y0= tc->cursorY/8; u8 se= tc->cattr[TTE_SPECIAL] + gid*charW*charH; int ix, iy; for(ix=0; ix<charW; ix++) for(iy=0; iy<charH; iy++, se++) _sbmp8_plot(&tc->dst, ix+x0, iy+y0, se); }
The demo for affine map text is text_tte_ase()
. The idea
is simple here: set up the text for a 256×256 pixel map,
write some text onto it and rotate the background to illustrate that
it is indeed an affine background. The center of rotation is the
“o” at the center of the screen. To place it there, I've
used the #{P:x,y}
code; this sets the cursor to the
absolute position given by (x, y). The other string is also
positioned on the map in this manner.
void test_tte_ase() { irq_init(NULL); irq_add(II_VBLANK, NULL); REG_DISPCNT= DCNT_MODE1 | DCNT_BG2; // Init affine text for 32x32t bg tte_init_ase( 2, // BG number BG_CBB(0) | BG_SBB(28) | BG_AFF_32x32, // BG control 0, // Tile offset (special cattr) CLR_YELLOW, // Ink color 0xFE, // BUP offset (on-pixel = 255) NULL, // Default font (sys8) NULL); // Default renderer (ase_drawg_s) // Write something tte_write("#{P:120,80}o"); tte_write("#{P:72,104}Round, round, #{P:80,112}round we go"); // Rotate it AFF_SRC_EX asx= { 124<<8, 84<<8, 120, 80, 0x100, 0x100, 0 }; bg_rotscale_ex(®_BG_AFFINE[2], &asx); while(1) { VBlankIntrWait(); key_poll(); asx.alpha += 0x111; bg_rotscale_ex(®_BG_AFFINE[2], &asx); if(key_hit(KEY_START)) break; } }
test_tte_ase
.
Bitmap text rendering is a little different from map text and can range in difficulty from easy to insane, depending on your wishes. At its core, though, it's always the same process: loop over all pixels and draw them on the destination surface. For example, a generic glyph renderer that draws pixels transparently could look something like this.
// Pseudo code for a general glyph printer. void foo_drawg(uint gid) { TTC *tc= tte_get_context(); TFont *font= tc->font; // Drawing with color keying. // Loop over all pixels. If the glyph's pixel is not zero, draw it. // Other wise, nevermind. for(iy=0; iy<tte_get_glyph_height(gid); iy++) { for(ix=0; ix<tte_get_glyph_width(gid); ix++) { u16 color= font_get_pixel(font, gid, ix, iy); if(color != 0) foo_plot(&tc->dst, tc->cursorX+ix, tc->cursorY+iy, color); } } }
Here, foo
can mean any rendering family.
foo_plot()
is a general pixel plotter and
font_get_pixel()
a pixel retriever. The implementations
of those functions depend on the specifics of the font and surface,
but the glyph renderer doesn't need to know about that.
This next function is an example of a 16bpp font to 16bpp bitmap printer.
//! Glyph renderer from bmp16 glyph to bmp16 destination. void bmp16_drawg(uint gid) { // (1a) Basic variables TTC *tc= tte_get_context(); TFont *font= tc->font; u16 *srcD= (u16*)(font->data+gid*font->cellSize), *srcL= srcD; uint charW= tte_get_glyph_width(gid) uint charH= tte_get_glyph_height(gid); uint x0= tc->cursorX, y0= tc->cursorY; uint srcP= font->cellW, dstP= tc->dst.pitch/2; u16 *dstD= (u16*)(tc->dst.data + (y0*dstP + x0)*2); // (2) The actual rendering uint ix, iy; for(iy=0; iy<charH; iy++) { for(ix=0; ix<charW; ix++) if(srcD[ix] != 0) dstD[ix]= srcD[ix]; srcD += srcP; dstD += dstP; } }
Blocks 1a and 1b set up the main variables to use in the loop. The
most important are the source and destination pointers,
srcD
and dstD
, and their pitches,
srcP
and dstP
. Notice that the source pitch,
srcP
, is the not the character width, but the cell width,
because the fonts are organized on a cell-grid. The code at point 2
selectively copies pixels from the font to the surface.
You may wonder why bmp16_drawg()
doesn't follow the
pattern in the earlier foo_drawg()
more closely.
The answer is, of course, performance. And before anyone quotes
Knuth on me: not every effort to make your code fast is premature
optimization. When you can improve the speed of the code without
spending too much effort of a loss of readability, there's not
much reason not to.
In this case, the optimizations I've applied here fall into two categories: local variables and pointer arithmetic. These techniques – that every C programmer should know – managed to boost the speed by a factor of 5.
Let's start with pointers. I'm using pointer to my advantage in
two places here. First, instead of using the font-data pointer
and destination pointer directly, I create pointers
srcD
and dstD
and direct them the
top-lefts of the glyph and where it will be rendered to. Short-circuiting
the accesses like this means that I don't have to apply extra offsets
to get to where I want to go in the loop. This will be both faster and
in fact more readable as well, because the loops won't contain any
non-essential expressions.
//# Example of a more standard bitmap copier. for(iy=0; iy < charH; iy++) for(ix=0; ix < charW; ix++) dstD[iy*dstP + ix]= srcD[iy*srcP+ix];
A second point here is using incremental offsets instead of the
y*pitch+x
form (see above). I suppose this is mostly a
matter of preference, but avoiding the wholly unnecessary
multiplications does matter.
The second optimization is local variables. By this I mean loading variables that reside in memory (globals and struct members) or oft-used function results in local temporaries. This may seem like a silly thing to point out, but the amount of time you can save with this is actually quite high.
Consider the use of tte_get_glyph_width()
here.
I know the width of a glyph won't change during the loop, so
calling a function to get the width in the loop-condition itself
is just stupid. Another example of this would be calling
strlen()
when looping over all characters in a string.
For those who do this: NO! Bad programmer, bad! Save the value
in a local variable and use that instead.
The other point is to pre-load things like globals and struct/class members if you use them more than once. Consider the following code. It's the same as the one given before, only now I have not loaded character height and the pitches into local temporaries.
//# Another bitmap-copy example. DO NOT USE THIS !!! for(iy=0; iy < font->charH; iy++) for(ix=0; ix < charW; ix++) dstD[iy*tc->dst.pitch/2 + ix]= srcD[iy*font->cellW + ix];
The result: the speed of the function was halved! I expected it to be slower, but that this innocuous-looking modification would actually cost me a factor two was quite a surprise to me.
So yes, spam your code width locals for loop-invariant, memory-based quantities. This avoids them being loaded from memory every time. As a bonus, the loops themselves win contain less text and be more generalized, making it more reusable.
Both the pointer work and pre-loading variables are actually the job of the compiler's optimizer, but the current version of GCC doesn't do these very well or at all. Also, sometimes it can't do this optimization. When functions are called between memory dereferences, the compiler has to reload the data because those functions may have changed their contents. Obviously, this wouldn't happen for locals.
Struct members and global variables live in memory, not CPU registers. Before the CPU can use their data, they have to be loaded from memory into registers first, and this often happen more times then necessary. Since memory access (especially ROM access) is slower than register access, this can really bog down an algorithm if the thing is used more than once. You can avoid this needless work by creating local variables for them.
Aside from the speed-boost, local variables can use shorter names, resulting in shorter and more readable code in the parts that actually do the work. It's win-freakin'-win, baby.
The renderer described above assumes that the glyphs are in formatted as 16-bpp bitmaps. However, TTE's default fonts are in a 1-bpp tilestrip format, so I'll have to use something else. Before I go into the details of that function, I'd like to discuss the different glyph formats and why I'm using tile-strips instead of just plain bitmaps.
When I say glyph formats, what I really mean is the order in which pixels are accessed. Three key variations exist.
Fig 22.5 shows these three layouts, including the loop structure and the order in which the pixels are traversed for 1bpp fonts. The case is a little bit different because of the bit-packing: 1 bpp means 8 pixels per byte. As a result, the bitmap-x loops have to be broken up into groups of 8, so that the bitmap format now uses three nests of loops instead of just two. There is no difference for the tiled formats, as those are grouped by 8 pixels anyway. Not only that, if you were to calculate the total loop-overhead for commonly used glyph sizes it turns out that this arrangement actually works particularly well for tile-strips. This is left as an exercise for the reader (hint: count the number of comparisons).
The next routine takes 1-bpp tile-strip glyphs and turns them
into output suitable for 16-bpp bitmap backgrounds. The output will
use the ink attribute for color and it will only draw a pixel if the bit
was 1 in the source data, giving us transparency. The three macros at the
top declare and define the basic variables, comparable to step 1a
in bmp16_drawg()
.
void bmp16_drawg_b1cts(uint gid) { // (1) Basic variables TTE_BASE_VARS(tc, font); TTE_CHAR_VARS(font, gid, u8, srcD, srcL, charW, charH); TTE_DST_VARS(tc, u16, dstD, dstL, dstP, x0, y0); uint srcP= font->cellH; dstD += x0; u32 ink= tc->cattr[TTE_INK], raw; // (2) Rendering loops. uint ix, iy, iw; for(iw=0; iw<charW; iw += 8) // loop over tile-strips { dstL= &dstD[iw]; for(iy=0; iy<charH; iy++) // loop over tile-lines { raw= srcL[iy]; for(ix=0; raw>0; raw>>=1, ix++) // loop over tile-scanline (8 pixels) if(raw&1) dstL[ix]= ink; dstL += dstP/2; } srcL += srcP; } }
The routine starts by calls to three macros:
TTE_CHAR_VARS()
, TTE_CHAR_VARS
and
TTE_DST_VARS()
. These declare and define most of the
relevant local variables, similar to step 1a in
bmp16_drawg()
. Note that two of the arguments here are
datatype identifiers for the source and destination pointers,
respectively. srcD
and srcL
will initially
point to the start of the source data. The other pointers,
dstD
and dstL
, point to the start
of the scanline in the destination area. They haven't been
corrected for the x-position just yet; that's done right
after it.
The reason I'm using two pairs of pointers here (a main data
pointer, fooD
and a line pointer
fooL
) is because of the pointer arithmetic. The
data-pointer stays fixed and the line-pointer moves around in the
inner loop.
The tile-strip portion of fig 22.5
illustrates how the routine moves over all the pixels. Because it's
for a 1-bpp bitpacked font and because there are 8 pixels per
tile-line, we can get an entire line's worth of pixels in one
byte-read. Rendering transparently gives us a nice chance for
optimization as well: if the tile-line is empty (i.e.,
raw
==0), we have no more visible pixels in that
line and we can move on to the next. A glance at the verdana 9
font in fig 22.1, will tell you
that you may be able to skip 50% of the pixels because of this.
The 8 bpp counterpart of the previous function is called
bmp8_drawg_b1cts()
, and is given below. The code is very
similar to the 16 bpp function, but because the pixels are now bytes
there are a few differences in the details.
void bmp8_drawg_b1cts(uint gid) { // (1) Basic variables TTE_BASE_VARS(tc, font); TTE_CHAR_VARS(font, gid, u8, srcD, srcL, charW, charH); TTE_DST_VARS(tc, u16, dstD, dstL, dstP, x0, y0); uint srcP= font->cellH; dstD += x0/2; u32 ink= tc->cattr[TTE_INK], raw, px; uint odd= x0&1; // (2) Source offset. uint ix, iy, iw; for(iw=0; iw<charW; iw += 8) // Loop over strips. { dstL= &dstD[iw/2]; for(iy=0; iy<charH; iy++) // Loop over lines. { raw= srcL[iy]<<odd; // (3) Apply source offset. for(ix=0; raw>0; raw>>=2, ix++) // Loop over pixels. { // (4) 2-bit -> 2-byte unpack, then used as masks. px= ( (raw&3)<<7 | (raw&3) ) &~ 0xFE; dstL[ix]= (dstL[ix]&~(px*255)) + ink*px; } dstL += dstP/2; } srcL += srcP; } }
The only real difference with bmp16_drawg_b1cts
is in the
inner-most loop. The no-byte-write issue for VRAM means that we need
to write two pixels in one pass. To do this, I retrieve and unpack
two bits into two bytes and use them to create the new pixels and the
pixel masks. The first line in the inner loop does the unpacking. It
transforms the bit-pattern ab
into
0000 000a 0000 000b
. Both bytes
in this halfword are now 0 or 1, depending on whether a
and b were on or off. By multiplying with ink
and 255, you can get the colored pixels and the appropriate mask
for insertion.
# 2-bit to 2-byte unpacking. 0000 0000 hgfe dcba p = raw (start) 0000 0000 0000 00ba p &= 3 0000 000b a000 00ba p |= p<<7; 0000 000b 0000 000a p &= ~0xFE;
Preparing the right halfword is only part of the work. If
cursorX
(i.e., x0
) is odd, then the glyph
should be plotted to an odd starting location as well. However,
the destination pointer dstL
is halfword pointer and
these must always be halfword aligned. To take care of this, note
that unpacking the pattern ‘abcd efgh
’
to an odd boundary is equivalent to unpacking
‘a bcde fgh0
’ to an even
boundary. This is exactly what the extra shift by odd
is for.
For the demo of this section, I'd like to use a technique called sub-pixel rendering. This is a method for effectively tripling the horizontal resolution for rendering by ‘borrowing’ colors from other pixels.
Consider the letter ‘A’ as shown in fig 22.6a. As you know, each pixel is composed of three colors: red, green and blue. These are the sub-pixels. The letter on the sub-pixel grid looks like fig 22.6b. Notice how the colors are still grouped by pixels, which on the sub-pixel grid gives very jagged edges. The trick to sub-pixel rendering is to shift groups of sub-pixels left or right, resulting in smoother edges (fig 22.6c). Now combine the pixels to RGB colors again to get fig 22.6d. Zoomed in as it is in fig 22.6, sub-pixel rendering may not look like much, but when used in the proper size the effects can be quite stunning.
Sub-pixel rendering isn't useful for everything. Because it muddles the concept of pixel and color a little, it's only useful for gray-scale images. This does make it great for text, of course. Secondly, the order in which the sub-pixels are ordered also matters. The process shown in fig 22.6 will work for RGB-ordered screens, but would fail quite spectacularly when the pixels are BGR-ordered. Going into all the gritty details it too much to do here, so I'll refer you to http://www.grc.com/ctwhat.htm, which explains the concept in more detail and gives a few examples too.
JanoS (http://www.haluz.org/yesh/) has created nice 4×8 sub-pixel font for use on GBA and NDS (see fig 22.7). A width of 4 is really tiny; it's impossible to have glyphs of that size with normal rendering and still have readable text. With sub-pixel rendering, however, it still looks good and now you can have many more characters on the screen than usual.
The output of the demo can be seen in fig 22.8. Because sub-pixel rendering is so closely tied to the hardware you're viewing with, it will probably look crummy on most screens or paper. You really have to see it on a GBA screen for the full effect.
In this particular case, I've converted the font to work with
bmp16_drawg()
: a 16bpp font in bitmap layout. Creating
an 8-bit version would not be very hard either. A 1-bpp bitpacked font
would of course be impossible because the font has more than two colors.
To make sub-pixel fonts look right, you'll actually need a lot of
colors: one for each combination of R,G,B, and with difference shades
of each. That said, JanoS has managed to reduce the amount of colors to
20 here without too much loss in quality. If anyone wants it, I also
have a 15-color version to use with 4bpp fonts.
//! Testing a bitmap renderer with JanoS' sub-pixel font. void test_tte_bmp16() { irq_init(NULL); irq_add(II_VBLANK, NULL); REG_DISPCNT= DCNT_MODE3 | DCNT_BG2; tte_init_bmp(3, &yesh1Font, bmp16_drawg); tte_init_con(); const char *str= "http://en.wikipedia.org/wiki/Subpixel_rendering :\n" "Subpixel rendering is a way to increase the " "apparent \nresolution of a computer's liquid crystal " "display (LCD).\nIt takes advantage of the fact that " "each pixel on a color\nLCD is actually composed of " "individual red, green, and\nblue subpixel stripes to " "anti-alias text with greater\ndetail.\n\n" " 4x8 sub-pixel font by JanoS.\n" " http://www.haluz.org/yesh/\n"; tte_write(str); key_wait_till_hit(KEY_ANY); }
Object text is useful if you want the characters to move around a bit, or if you simply don't have any room on a background. There are a few possibilities for object text. The most obvious one is to load all the characters into object VRAM and set the tile-indices of the objects to use the right tiles. This is what the TTE object system uses.
In many ways, this kind of object text is similar to tilemap text.
The tiles are loaded up front and you change the relevant mapping
entries (in this case attr2
of the objects) to the right
number. Of course, there are some notable differences as well.
For one thing, the positions of the characters must be written to the objects. But not only that, the objects also need to know how big they're supposed to be, and whether they have any other interesting qualities like rotation and palettes. For that reason, I've chosen to use the color attributes 0, 1 and 2 to store the object attributes 0, 1 and 2.
Another problem is which objects to use and how many. This last one could present a big problem, actually, because you may also want to use objects for normal sprites and it would be a really bad idea if they were suddenly overridden by the text system.
For the latter issue, I use (or perhaps abuse) the dst
member of the context. Each glyph is represented by an object, so
I'll need an object array, but I'm doing it with a little twist.
I'm going to start at the end of the array, so that the
lower objects can still be used for sprites as normal. Essentially,
I'm using OAM as an empty-descending stack. In this arrangement,
dst.data
points to the top of the stack (i.e., the last
element in the array), dst.pitch
is the index
to the current object, and dst.width
is the length
of the stack.
The default plotter of objects is obj_drawg
. Remember,
dst.pitch
is used as an index here and dst.data
is the top of the stack, so a negative index is used to get
the current object. After that, the coordinates and the correct
glyph index are merged with the color attributes to create the
final object.
//! Glyph-plotter using objects. void obj_drawg(uint gid) { TTC *tc= tte_get_context(); TFont *font= tc->font; uint x0= tc->cursorX, y0= tc->cursorY; // (1) find the right object, and increment index. uint id= tc->dst.pitch; OBJ_ATTR *obj= &((OBJ_ATTR*)tc->dst.data)[-id]; tc->dst.pitch= (id+1 < tc->dst.width ? id+1 : 0); // (2) Set object attributes. obj->attr0= tc->cattr[0] + (y0 & ATTR0_Y_MASK); obj->attr1= tc->cattr[1] + (x0 & ATTR1_X_MASK); obj->attr2= tc->cattr[2] + gid*font->cellW*font->cellH/64; }
And, yes, I know that this use of the dst
member is
somewhat … unorthodox; but it wasn't used
here anyway so why not. I am considering using something more
proper, but not just yet. Also, remember that this system assumes
that the font is already loaded into VRAM and that this can take up
a lot of the available tiles. Using the verdana 9 font,
that'd be 2*240 = 480 tiles. That's nearly half of object VRAM.
A safer alternative would be to load the necessary tiles dynamically,
but that would require more resource management.
The way object text is handled in TTE works, but it the implementation
is not exactly pretty. The way I'm using TTC.dst
here is,
well, bad. There is a good chance I'll clean it up a bit later, or at
the very least hide the implementation better.
The defining characteristic of objects is that they're separate from backgrounds; they can move around the screen independently. Object text is most likely used for text that is dynamic or has to travel along some sort of path. In this case, I'll make them fly on a parameterized path called a Lissajous curve (see fig 22.9).
The code is given below. After initializing the usual suspects,
tte_init_obj()
is called. The object stack starts at
the back of OAM, which is also what the system defaults to if
NULL
passed as the first parameter. The next three
are the object attributes. Because I want to use the default variable
width font, verdana 9, the attributes should be set to 8×16
objects. The bitdepth of the tiles will always be 4 to keep the
number of used tiles within limits. The rest of the initialization
should be easy to understand.
Making the string itself is done at step 2. Note that the string also set the paper color attribute (which corresponds to obj.attr2) to 0x1000 to make the “omg” red. After these few lines, the text handling itself is complete.
In step 3, the coordinates on the path are calculated. The
t
parameter indicates the how far along the path we are.
It is used to calculate the coordinates of each letter – the
first one using t
itself, and the rest are essentially
time-delayed. Don't be distracted by the magic numbers: the only
reason for their values is to make the effect look alright. Try
tweaking them a little to see what they do exactly.
// Object text demo void test_tte_obj() { // Base inits irq_init(NULL); irq_add(II_VBLANK, NULL); REG_DISPCNT= DCNT_MODE0 | DCNT_OBJ | DCNT_OBJ_1D; oam_init(oam_mem, 128); VBlankIntrWait(); // (1) Init object text, using verdana 9 (8x16 objects) OBJ_ATTR *objs= &oam_mem[127]; tte_init_obj( objs, // Start at back of OAM ATTR0_TALL, // attr0: 8x16 objects ATTR1_SIZE_8, // attr1: 8x16 objects 0, // attr2: nothing special CLR_YELLOW, // Yellow ink 0x0E, // ink pixel 14+1 = 15 &vwf_default, // Verdana 9 font NULL); // Default renderer (obj_drawg) pal_obj_bank[1][15]= CLR_RED; // (2) Write something (and prep for path) const char *str= "Parametrized object text, omg!!!"; const int len= strlen(str); tte_write("Parametrized object text, #{cp:0x1000}omg#{cp:0}!!!"); // Play with the objects int ii, t= 0x9000; while(1) { VBlankIntrWait(); key_poll(); // (3) Make lissajous figure for(ii=0; ii<len; ii++) { int ti= t-0x380*ii; // Get the path param for letter ii obj_set_pos(&objs[-ii], (96*lu_cos( ti)>>12)+120, // y= Ay*cos( t) + y0 (64*lu_sin(2*ti)>>12)+80); // x= Ax*sin(2*t) + x0 } t += 0x00A0; if(key_hit(KEY_START)) break; } }
Using tilemaps for text is nice, but will only work if the dimensions of the glyphs are multiples of 8. There are a few drawbacks in terms of readability: narrow characters such as ‘i’ will seem either overly wide, or be surrounded by many empty pixels. Also, you can't put many characters on a line because there are only so many tiles.
Variable-width fonts (vwf; also known as proportional fonts) solve this problem. Using variable-width fonts on bitmaps is quite easy, as shown in section 22.4.. However, using it in tilemap modes is a little trickier: how do you draw on a tilemap where the tiles are 8×8 in size?
Well, you don't. Not exactly. The key is not to draw to the map, but to the tiles that the map shows.
The usual way to work with tilemaps is that you load up a tileset, and then select the ones you want to show up on the screen by filling the tilemap. In those circumstances, the tileset is often static, with the map being updated for things like scrolling. Rendering to tiles reverses that procedure.
First, you need to set up a map where each entry points to a unique tile. This essentially forms a graphical surface out of the tiles, which you can then draw to like any other. The most obvious way to do this is to simply fill up the screenblock with consecutive numbers (see fig 22.10a). However, a better way to map the tiles is by mapping tiles in column-major order (see fig 22.10b) , for the same reason I chose it for the glyph format: the words in a column of tiles are consecutive.
![]() |
![]() |
Preparing the map is the easy part; the problem is knowing which part of which tile to edit to plot a pixel. First, you need to split the coordinates into tile coordinates and pixel-in-tile coordinates. This comes down to division and modulo by 8, respectively. Note that in column-major mode you only need to do this for the x coordinate. With this information, you can find the right word. The horizontal in-tile coordinate tells you which nybble in the word to update and at that point it's the usual bitfield insertion.
Tonclib has routines for drawing onto 4bpp, column-major tiles (referred to as chr4c mode). The plotter and the map preparation functions are given below, along with a demonstration routine to explain their use.
//# From tonc_schr4c.c //! Plot a pixel on a 4bpp tiled, column-major surface. void schr4c_plot(const TSurface *dst, int x, int y, u32 clr) { uint xx= x; // fluff to make x unsigned. u32 *dstD= (u32*)(dst->data + xx/8*dst->pitch); uint shift= xx%8*4; dstD[y] = (dstD[y] &~ (15<<shift)) | (clr&15)<<shift; } //! Prepare a screen-entry map for use with chr4c mode. void schr4c_prep_map(const TSurface *srf, u16 *map, u16 se0) { uint ix, iy; uint mapW= srf->width/8, mapH= srf->height/8, mapP= srf->pitch/32; for(iy=0; iy<mapH; iy++) for(ix=0; ix<mapW; ix++) map[iy*32+ix]= ix*mapP + iy + se0; } //# --- Simple test --------------------------------------------------- void test_chr4() { // (1) The usual irq_init(NULL); irq_add(II_VBLANK, NULL); REG_DISPCNT= DCNT_MODE0 | DCNT_BG0; REG_BG0CNT= BG_CBB(0) | BG_SBB(31); pal_bg_mem[1]= CLR_RED; pal_bg_mem[2]= CLR_GREEN; pal_bg_mem[3]= CLR_BLUE; pal_bg_mem[4]= CLR_WHITE; // (2) Define a surface TSurface srf; srf_init(&srf, SRF_CHR4C, // Surface type. tile_mem[0], // Destination tiles. SCREEN_WIDTH, // Surface width. SCREEN_HEIGHT, // Surface height. 4, // Bitdepth (ignored due to SRF_CHR4C). pal_bg_mem); // Palette. // (3) Prepare the map schr4c_prep_map(&srf, se_mem[31], 0); // (4) Plot some things int ii, ix, iy; for(iy=0; iy<20; iy++) for(ix=0; ix<20; ix++) schr4c_plot(&srf, ix+3, iy+11, 4); for(ii=0; ii<20; ii++) { schr4c_plot(&srf, ii+4, 12, 1); // Red line schr4c_plot(&srf, ii+4, ii+12, 2); // Green line schr4c_plot(&srf, 4, ii+12, 3); // Blue line } }
The pixel plotter starts by finding the tile-column that the desired
pixel is in. The column-index is simply x/8; this is multiplied
by the pitch to get a pointer to the top of the column. Note that
pitch is used a little different than usual. Normally, it denotes the
number of bytes to the next scanline, but in this case it's used as
the byte-offset to next tile-column. For a column-major mode, this
comes down to the height×bpp*8/8, but all that
is done in srf_init()
. Once you have the right tile,
the pixel you want is in the x%8th nybble, meaning
the required shift for the insertion is x%8*4. After that, it's
just a matter of inserting the color.
(For the curious: I'm casting
x to unsigned int first because division and modulo will then
be optimized to shifts/masks properly.)
The schr4c_prep_map()
function just initializes the
map in the order given in fig 22.10b.
Well, almost.I'm also adding a value to each screen-entry like I
usually do for palettes and tile-offsets.
The output of test_chr4()
can be seen in
fig 22.11a. It's a white rectangle
with red, green and blue lines, as expected.
Fig 22.11b is a picture taken from
VBA's tile viewer, showing how the contents of the surface. Doesn't
quite look what's on the screen, does it? Still, if you look closely,
you can figure out how it works. Each set of 20 tiles forms one
tile-column on the screen (indicated by yellow blocks). When you
place these tiles on top of each other, you'll see the picture of
fig 22.11a emerge.
![]() Fig 22.11a: chr4_test() output |
![]() Fig 22.11b: chr4_test() tiles. The yellow blocks indicate tiles of a single column. |
The easiest way to render glyphs to tiles is to follow the template from section 22.4.. This is done in the function below.
//! Simple version of chr4 renderer. void chr4_drawg_b1cts_base(uint gid) { TTE_BASE_VARS(tc, font); TTE_CHAR_VARS(font, gid, u8, srcD, srcL, charW, charH); uint x0= tc->cursorX, y0= tc->cursorY; uint srcP= font->cellH; u32 ink= tc->cattr[TTE_INK], raw; uint ix, iy, iw; for(iw=0; iw<charW; iw += 8) { for(iy=0; iy<charH; iy++) { raw= srcD[iy]; for(ix=0; raw>0; raw>>=1, ix++) if(raw&1) schr4c_plot(&tc->dst, x0+ix, y0+iy, ink); } srcD += srcP; x0 += 8; } }
Now, you may think that this runs pretty slowly thanks to all the
recalculations in schr4c_plot()
. And you'd be right,
but in truth, it's not as bad as I originally thought. It is possible
to speed it up by simply inlining things, but the real gain comes
from drawing pixels in parallel.
Instead of plotting pixel individually, you can also plot multiple
pixels simultaneously. The bmp8_drawg_b1cts()
renderer
we saw earlier did this: it unpacked 2 pixels and drew together.
In the case of 4bpp tiles, you can unpack the source byte into one
(32bit) word and plot eight pixels at once. The only downside
is that you'll probably have to split it over two tiles.
The next function is TTE's main glyph renderer for tiles, and it is
a doozy. There are two stages for the rendering in the inner loop:
bit unpacking the source byte, raw
and splitting the
prepared pixel px
into two adjacent tiles. These
correspond to steps 3 and 4, respectively.
Normally, the bitunpack is done in a loop, but sometimes it's
faster to do it in other ways. For details, see my document on
bit
tricks. The first five lines of step 3 do the unpacking. For
example, it turns a binary 0001 1011
into a
hexadecimal 0x00011011
. This is then multiplied by
15 and ink
to give the pixel mask pxmask
and the colored pixels px
, respectively.
Step 4 distributes the word with the pixels over two tiles if necessary. In step 1, left and right shifts were prepared to supply the bit offsets for this procedure. Now, for larger glyphs this will mean that certain destination words are used twice, but this can't be helped (actually it can, but the procedure is ugly and possibly not worth it). An alternative to this is using the destination once and read (and unpack/color) the source twice; however, as VRAM is considerably faster than ROM I doubt this would be beneficial.
//! Render 1bpp fonts to 4bpp tiles; col-major order. void chr4c_drawg_b1cts(uint gid) { // Base variables. TTE_BASE_VARS(tc, font); TTE_CHAR_VARS(font, gid, u8, srcD, srcL, charW, charH); uint x= tc->cursorX, y= tc->cursorY, dstP= tc->dst.pitch/4; uint srcP= font->cellH; // (1) Prepare dst pointers and shifts. u32 *dstD= (u32*)(tc->dst.data + (y + x/8*dstP)*4), *dstL; x %= 8; uint lsl= 4*x, lsr= 32-4*x, right= x+charW; // Inner loop vars. u32 px, pxmask, raw; u32 ink= tc->cattr[TTE_INK]; const u32 mask= 0x01010101; uint iy, iw; for(iw=0; iw<charW; iw += 8) // Loop over strips { // (2) Update and increment main data pointers. srcL= srcD; srcD += srcP; dstL= dstD; dstD += dstP; for(iy=0; iy<charH; iy++) // Loop over scanlines { raw= *srcL++; if(raw) { // (3) Unpack 8 bits into 8 nybbles and create the mask raw |= raw<<12; raw |= raw<< 6; px = raw & mask<<1; raw &= mask; px = raw | px<<3; pxmask= px*15; px *= ink; // (4a) Write left tile: dstL[0] = (dstL[0] &~ (pxmask<<lsl) ) | (px<<lsl); // (4b) Write right tile (if any) if(right > 8) dstL[dstP]= (dstL[dstP] &~ (pxmask>>lsr) ) | (px>>lsr); } dstL++; } } }
chr4c_drawg_b1cts()
is pretty fast. It certainly is
faster than the earlier version by about 33%. It's actually even
faster than the bmp8 renderer, but only by a slim margin.
Of course, you can always go one better. The various shifts and conditionals make it perfect for ARM code, rather than Thumb. And to make sure it goes exactly according to plan, I'm doing this in assembly.
The next function is chr4_drawg_b1cts_fast()
, the
ARM assembly equivalent of version 2. There's an almost one-to-one
correspondence between the C and asm loops, so just loop to the
C version for the explanation.
Speed-wise, the asm version is much better than the C version. Even in ROM, which is very bad for ARM code, it is still faster than the Thumb version. There are one or two tiny details by which you can speed this thing up, but by and large this should be it for fonts of arbitrary dimensions. Of course, if you have fixed sizes for your font and do not require recoloring or transparency, things will be a little different.
// Include TTC/TFont member offsets plz. #include "tte_types.s" /* IWRAM_CODE void chr4c_drawg_b1cts_fast(int gid); */ .section .iwram, "ax", %progbits .arm .align .global chr4c_drawg_b1cts_fast chr4c_drawg_b1cts_fast: stmfd sp!, {r4-r11, lr} ldr r5,=gp_tte_context ldr r5, [r5] @ Preload dstBase (r4), dstPitch (ip), yx (r6), font (r7) ldmia r5, {r4, ip} add r3, r5, #TTC_cursorX ldmia r3, {r6, r7} @ Get srcD (r1), width (r11), charH (r2) ldmia r7, {r1, r3} @ Load data, widths cmp r3, #0 ldrneb r11, [r3, r0] @ Var charW ldreqb r11, [r7, #TF_charW] @ Fixed charW ldrh r3, [r7, #TF_cellS] mla r1, r3, r0, r1 @ srcL ldrb r2, [r7, #TF_charH] @ charH ldrb r10, [r7, #TF_cellH] @ cellH @ Positional issues: dstD(r0), lsl(r8), lsr(r9), right(lr), cursorX mov r3, r6, lsr #16 @ y bic r6, r6, r3, lsl #16 @ x add r0, r4, r3, lsl #2 @ dstD= dstBase + y*4 mov r3, r6, lsr #3 mla r0, ip, r3, r0 and r6, r6, #7 @ x%7 add lr, r11, r6 @ right= width + x%8 mov r8, r6, lsl #2 @ lsl = x%8*4 rsb r9, r8, #32 @ lsr = 32-x%8*4 ldr r6,=0x01010101 ldrh r7, [r5, #TTC_ink] @ --- Reg-list for strip/render loop --- @ r0 dstL @ r1 srcL @ r2 scanline looper @ r3 raw @ r4 px / tmp @ r5 pxmask @ r6 bitmask @ r7 ink @ r8 left shift @ r9 right shift @ r10 dstD @ r11 charW @ ip dstP @ lr split indicator (right edge) @ sp00 charH @ sp04 deltaS = cellH-charH (delta srcL) cmp r11, #8 @ Prep for single-strip render suble sp, sp, #8 ble .Lyloop @ Prep for multi-strip render sub r3, r10, r2 mov r10, r0 stmfd sp!, {r2, r3} @ Store charH, deltaS b .Lyloop @ --- Strip loop --- .Lsloop: @ (2) Update and increment main data pointers. ldmia sp, {r2, r3} @ Reload charH and deltaS add r10, r10, ip @ (Re)set dstD/dstL mov r0, r10 add r1, r1, r3 sub lr, lr, #8 @ --- Render loop --- .Lyloop: @ (3) Prep px and pxmask ldrb r3, [r1], #1 orrs r3, r3, r3, lsl #12 beq .Lnopx @ Skip if no pixels orr r3, r3, r3, lsl #6 and r4, r3, r6, lsl #1 and r3, r3, r6 orr r3, r3, r4, lsl #3 rsb r5, r3, r3, lsl #4 mul r4, r3, r7 @ (4a) Render to left tile ldr r3, [r0] bic r3, r3, r5, lsl r8 orr r3, r3, r4, lsl r8 str r3, [r0] @ (4b) Render to right tile cmp lr, #8 ldrgt r3, [r0, ip] bicgt r3, r3, r5, lsr r9 orrgt r3, r3, r4, lsr r9 strgt r3, [r0, ip] .Lnopx: add r0, r0, #4 subs r2, r2, #1 bne .Lyloop @ Test for strip loop subs r11, r11, #8 bgt .Lsloop add sp, sp, #8 ldmfd sp!, {r4-r11, lr} bx lr @ EOF
Bitpacked fonts will give you monochrome glyphs. If you want more colors – for shading or anti-aliasing – you'll need to use more bits. The code for this is nearly identical to the 1bpp bitpacked version; the most important differences being a different source datatype and an alternative method for finding the right mask. Oh, and you won't have to unpack the bits anymore, of course.
The following snippet shows how you can make a transparency mask out of a word of 4bit pixels. Essentially, you mask all the bits of a nybble together and mask out the other bits of that nybble. This gives 0 if the whole nybble was empty, or 1 is it wasn't. This can then again be multiplied by 15 to give the proper mask.
// Create pixel mask from 8x 4 bits u32 *srcL= ...; // Source is now 32bit. raw = *srcL++; // Source word: 8x 4 bits pxmask = raw; pxmask |= pxmask>>2; // bit0 = bit0 | bit2 pxmask |= pxmask>>1; // bit0 = bit0 | bit1 | bit2 | bit3; pxmask &= 0x11111111; // bit0 is 0 only if bits 0-3 were all 0 pxmask *= 15;
No, not shady characters; shaded characters. What you'll often see in games is that the text has either an outline or a bit of shading on one side. While it is possible to create shading with a 1bpp font, it's easier to simply build it into the font itself (see fig 22.12). Because this means more colors than 1bpp can handle, you may be tempted to use a 2bpp font here. However, unless you are really stressed for memory, it's more convenient to use 4bpp here as well.
At that point, you can follow the procedure described earlier. But by
cleverly using the bits that make up the shading, you can allow the
shadow color to be variable as well. For example, you can designate
bit 0 as the 'ink' bit, bit 1 as the 'shadow' bit and if necessary
bit 2 as the 'paper' bit. Then raw&0x11111111
gives
the 'ink' mask, and (raw>>1)&0x11111111
gives the 'shadow' mask; these can then be used to apply colors and
to create the full mask.
The following is a demonstration of how this can be done. Note that
each line here corresponds exactly to one ARM instruction, so this
should be an efficient method. Well, in ARM code anyway.
// Use bits 0 and 1 from each nybble to create masks and apply colors. u32 *srcL= ...; raw = *srcL++; // Source word: 8x 4 bits px = raw & 0x11111111; // Bit 0 for ink pixels raw = raw>>1 & 0x11111111; // Bit 1 for shadow pixels pxmask = px | raw; // Mask of ink and shadow bits pxmask *= 15; px = px * ink; // Color with ink px += raw* shadow; // Add shadow pixels
The chr4c_drawg_b4cts()
renderer uses this method to color
both the ink and shadow pixels. It's essentially
chr4c_drawg_b4cts()
except for the things in bold and the
removal of the bit unpacking. Also note the ‘no-pixel’
condition here. If pxmask
is zero, there's nothing to do;
and so we won't.
//! 4bpp font, tilestrips with ink/shadow coloring. void chr4c_drawg_b4cts(uint gid) { TTE_BASE_VARS(tc, font); TTE_CHAR_VARS(font, gid, u32, srcD, srcL, charW, charH); uint x= tc->cursorX, y= tc->cursorY; uint srcP= font->cellH, dstP= tc->dst.pitch/4; // (1) Prepare dst pointers and shifts. u32 *dstD= (u32*)(tc->dst.data + (y+x/8*dstP)*4), *dstL; x %= 8; uint lsl= 4*x, lsr= 32-4*x, right= x+charW; // Inner loop vars u32 amask= 0x11111111; u32 px, pxmask, raw; u32 ink= tc->cattr[TTE_INK]; u32 shade= tc->cattr[TTE_SHADOW]; uint iy, iw; for(iw=0; iw<charW; iw += 8) // Loop over strips { srcL= srcD; srcD += srcP; dstL= dstD; dstD += dstP; for(iy=0; iy<charH; iy++) // Loop over scanlines { raw= *srcL++; // (3a) Prepare pixel mask px = (raw & amask); raw = (raw>>1 & amask); pxmask= px | raw; if(pxmask) { px *= ink; // (3b) Color ink pixels px += raw*shade; // (3c) Color shadow pixels pxmask *= 15; // (3d) Create mask // (4a) Write left tile: dstL[0] = (dstL[0] &~ (pxmask<<lsl) ) | (px<<lsl); // (4b) Write right tile (if any) if(right > 8) dstL[dstP]= (dstL[dstP] &~ (pxmask>>lsr) ) | (px>>lsr); } dstL++; } } }
I've done a fair bit of profiling for these tile renderers and think I have a decent knowledge of which techniques will be efficient and which won't. These are some of my observations.
ldr
+and
pair, it will get clever and
avoid the load by splitting the mask out over multiple
byte-size masks. So instead of one instruction in the inner-loop,
you now have four. Perhaps even more, depending on how many extra
registers this takes. Note, this only happens for
constants and only for ARM-compiled code. A work-around is to
have the mask in a global variable to be loaded before the loops.
This is in part why I've hand-assembled some of the routines.
Please apply the standard disclaimer to this list. I've found these techniques to work for my cases, but they won't apply to every case. For example, other systems (*cough* NDS) will have different CPU architectures and memory characteristics, and that would affect the speed.
The situation depicted in fig 22.13 should be familiar. The key point here is that there is a background map, and a dialog box with text in it. This text is static, but the position in the top-left corner is continually updated as you scroll along the map.
The core function for this demo is test_tte_chr4()
.
The first thing it does is call tte_init_chr4c()
to
initialize the text system for chr4c-mode. The third argument is the
offset for map-entries: 0xF000
meaning it uses
sub-palette 15. The fourth is a word for the color attributes:
13 for the ink, 15 for the shadow and 0 for the others. For this
demonstration, I'm using the 4bpp version of verdana9 (see
fig 22.12) and the fast assembly
version to render the glyphs.
Step 2 loads the background map and the dialog box. Note that the dialog box is copied to the tiles that the text is rendered to. When the text is printed, this will show that the glyphs indeed are rendered transparently. This does more or less mean that I can't use the standard eraser, because that'd wipe the box as well.
The dialog text is drawn in step 3. The ci
and
cs
tags set the ink and shadow color attributes,
respectively. This makes the string “arrows” use
colors 1 and 2 (well, 0xF1 and 0xF2), and so forth.
//! Set up a rectangle for text, with the non-text layers darkened for contrast. void win_textbox(uint bgnr, int left, int top, int right, int bottom, uint bldy) { REG_WIN0H= left<<8 | right; REG_WIN0V= top<<8 | bottom; REG_WIN0CNT= WIN_ALL | WIN_BLD; REG_WINOUTCNT= WIN_ALL; REG_BLDCNT= (BLD_ALL&~BIT(bgnr)) | BLD_BLACK; REG_BLDY= bldy; REG_DISPCNT |= DCNT_WIN0; tte_set_margins(left, top, right, bottom); } //! Test chr4 shaded text renderer void test_tte_chr4() { irq_init(NULL); irq_add(II_VBLANK, NULL); REG_DISPCNT= DCNT_MODE0 | DCNT_BG0 | DCNT_BG2; // (1) Init for text tte_init_chr4c( 0, // BG number. BG_CBB(0)|BG_SBB(10), // BG control. 0xF000, // Screen-entry base bytes2word(13,15,0,0), // Color attributes. CLR_BLACK, // Ink color &verdana9_b4Font, // Verdana 9, with shade. (fnDrawg)chr4c_drawg_b4cts_fast); // b4cts renderer, asm version tte_init_con(); // Initialize console I/O // (2) Load graphics LZ77UnCompVram(dungeon01Map, se_mem[12]); LZ77UnCompVram(dungeon01Tiles, tile_mem[2]); LZ77UnCompVram(dungeon01Pal, pal_bg_mem); GRIT_CPY(&tile_mem[0][16*30], dlgboxTiles); GRIT_CPY(pal_bg_bank[15], dlgboxPal); // (3) Create and print to a text box. win_textbox(0, 8, 160-32+4, 232, 160-4, 8); CSTR text= "#{P}Scroll with #{ci:1;cs:2}arrows#{ci:13;cs:15}, " "quit with #{ci:1;cs:2}start#{ci:13;cs:15}\n" "Box opacity with #{ci:3;cs:4}L/R#{ci:7;cs:9}"; tte_write(text); // Reset margins for coord-printing tte_set_margins(8, 8, 232, 20); int x=128, y= 32, ey=8<<3; REG_BG2HOFS= x; REG_BG2VOFS= y; // Invisible map buildup! REG_BG2CNT= BG_CBB(2) | BG_SBB(12) | BG_REG_64x64; REG_DISPCNT= DCNT_MODE0 | DCNT_BG0 | DCNT_BG2 | DCNT_WIN0; while(1) { VBlankIntrWait(); key_poll(); // (4) Scroll and blend x = clamp(x + key_tri_horz(), 0, 512+1-SCREEN_WIDTH); y = clamp(y + key_tri_vert(), 0, 512+1-SCREEN_HEIGHT); ey= clamp(ey+ key_tri_shoulder(), 0, 0x81); REG_BG2HOFS= x; REG_BG2VOFS= y; REG_BLDY= ey>>3; // (5) Erase and print new position. tte_printf("#{es;P}%d, %d", x, y); if(key_hit(KEY_START)) break; } }
I'll close off this section with a word on the text box. If you look carefully, you'll see that it's semi-transparent. Or, to be precise, the text and the box itself are at normal intensity, but the background that it covers is darker than usual. Two things are necessary for this nice, little effect.
This is what win_textbox()
is for. The function also sets
the margins so that the text would wrap nicely inside the box.
The TTE context contains members that control positioning, colors,
fonts as well as a few other things. There are two approaches to
changing these parameters. The first is to hard code changes in the
state through direct member access or functions like
tte_set_ink()
. This works nice and fast, but isn't
very flexible. The second is to use formatting tags in the
strings themselves – the system parses the string for these tags
and interprets them accordingly. This is basically a form of scripting.
The tags that TTE uses look like this:
#{tag0:args; tag1:args}
The code itself is starts with `#{
' and ends with
`}
'. Each command consists of a tag, followed by a colon
and comma-separated arguments when appropriate. Multiple commands
can be separated by a semi-colon. For example,
`#{es; P:10,16}
' would clear the screen and set the
cursor to (10, 16).
Now, I could show you how to parse this, but the parser currently in use for this is, well, let's just say it's long and very ugly. Essentially, it's a massive switch-block (sometimes a double switch-block) with stuff like this:
char *tte_cmd_default(const char *str) { int ch, val; char *curr= (char*)str, *next; TTC *tc= tte_get_context(); while(1) { ch= *curr; next= curr+1; // (1) Check first character switch(ch) { // (2) --- Absolute Positions --- case 'X': tc->cursorX= curr[1]==':' // If there's an argument ... ? strtol(curr+2, &next, 0) // set cursor X to arg : tc->marginLeft; // else move to start of line. break; ... more cases ... // (3) Find EOS/EOC/token and act on it curr= tte_cmd_next(next); if(curr[0] == '\0') return curr; else if(curr[0] == '}') return curr+1; } }
Like I said, ugly; but it'll have to do for now. The incoming pointer
points to the first character past the '#{
'. The command
tags are all single or double-lettered; the switch looks for a
recognized letter and acts accordingly.
One of the tags is 'X
', which sets the absolute
X-coordinate of the cursor. The tc->cursorX
will be
set to the argument if it is present, or to the left margin if it
is not.
Note the use of
strtol()
here. This is a very interesting
function. Not only does it work for both decimal and hex strings,
but through the second argument you can retrieve a pointer to right
after the number in the string. Alternatives would be
sscanf()
or atoi()
, but
strtol()
is nicer.
After handling a tag, it'll look for more tags, or exit if the end delimiter or end of string is found.
Table 22.4 shows the available tags. Note that they are case-sensitive and some items can do more than one thing, depending on the number of parameters.
Code | Description |
---|---|
P | Reset position to top-left margin. |
Pr | Restore cursor position (see also Ps ). |
Ps | Save cursor position. |
P: x,y | Set cursor to coordinates (x, y). |
X | Reset cursorX to left margin. |
X: x | Set cursorX to x. |
Y | Reset cursorY to top margin. |
Y: y | Set cursorY to y. |
c[ispx]: cattr | Set ink (ci ), shadow (cs ), paper
(cp ) or special (cx ) color attribute to
cattr.
|
e[slbf] | Erase
the screen between margins (es ),
the current line (el ),
the current line up to the cursor (eb ; backwards),
the current line from the cursor (ef ; forwards).
|
er: l,t,r,b | Erase a rectangle given by (l,t) to (r,b). |
f: idx | Set font to TTC.fontTable[idx] . |
m[ltrb]: value | Set left (ml ), top (mt ),
right (mr ) or bottom (mb ) margin to value.
|
m: l,t,r,b | Set margins to rectangle (l,t) - (r,b) |
p: dx, dy | Move the cursor by (dx, dy). |
s: idx | Print the idx'th string in TTC.stringTable . |
w: count | Wait for count frames. |
x: dx | Move the cursor to the right by dx. |
y: dy | Move the cursor down by dy. |
I should point out that at present the commands are still
fragile, so be careful with this stuff. For example, the positioning
commands will simply move the cursor, but not clip to the margins.
Also take care with the font and string commands (f
and s
, respectively). tte_cmd_default()
doesn't test whether the index is out of the bounds of the arrays,
so you could end up with … odd things. At some point, I hope
to fix these things, but it's not a priority right now. If anyone
has something more robust that I can use, please speak up.
The current commands in TTE aren't exactly idiot-proof yet. If you stick to sensible things, it should work quite nicely. But it is still easy to shoot yourself in the foot if you're not careful.
Something like tte_write()
is nice for pure strings,
but what would really help is if you had something like
printf()
. In the old days (pre-2006),
printf()
, putc
and other console output
functions were unavailable, but Wintermute added a mechanism to
devkitArm's standard C library that allows it on consoles as well.
The key to this is the devoptab_t
struct, defined in
sys/iosupport.h
. This contains a table of function pointers
to device operations. The pointer we're interested in here is
write_r
; this is the function that printf()
et al. call for the final output.
// Partial devoptab_t definition typedef struct { const char *name; int structSize; int (*open_r)(struct _reent *r, void *fileStruct, const char *path, int flags,int mode); int (*close_r)(struct _reent *r,int fd); int (*write_r)(struct _reent *r,int fd,const char *ptr,int len); ... } devoptab_t;
The key to making the standard console routines work on a GBA is
to redirect the default write_r
for console output
to one of our own making. Before explaining how this works, I want
you to understand that this comes very close to black magic. It
involves descending to the roots of the library and there is next
to no documentation about how this stuff works. This story is
the closest thing I could find to a full description:
http://www.embedded.com/story/OEG20020103S0073, but this
isn't high on explanations either.
To put it in other way: you're in a cave; it's pitch black and there are grues about.
Now that that's done, let's continue. The first step is creating our
replacement writer. In TTE's case, this is
tte_con_write()
. It is almost identical to
tte_write()
, but has to fit in the format given by
devoptab_t.write_r
. It comes down to this:
//! internal output routine used by printf. /*! \param r Reentrancy parameter. \param fd File handle (?). \param text Text buffer containing the string prepared by printf. \param len Length of string. \return Number of output bytes (?) \note \a text is NOT zero-terminated!!!!one! */ int tte_con_write(struct _reent *r, int fd, const char *text, int len) { // (1) Safety checks if(!sConInitialized || !text || len<=0) return -1; int ch, gid, charW; const char *str= text, *end= text+len; // (2) check for end of text while( (ch= *str) != 0 && str < end) { str++; switch(ch) { // (3) --- VT100 sequence ( ESC[foo; ) --- case 0x1B: if(str[0] == '[') str += tte_cmd_vt100(str); break; //# (4) Other character cases. See tte_write() } } return str - text; }
While I've added documentation for the arguments here, it's mostly
based on guesswork. The r
parameter contains
re-entrancy
information, useful if you have multiple threads. Since the GBA is
a single-thread system, this should not concern us. I believe
fd
is a file handle of some sort, but since we're not
writing to files this again does not concern us.
The real arguments of interest are text
and
len
. The text
argument points to the buffer
with the string to render. In the case of printf()
, it's
the string after formatting: all codes like %d
are already done. And now for the most important part:
text
is not null-terminated. This is why there's
a length variable as well.
As far as I can tell, printf
uses a large buffer
(approximately 1300 bytes) on the stack to which it writes the
formatted numbers. This buffer isn't cleared you call it again, or
terminated by ‘\0’ when sent to the writer. This has
the following consequences:
printf()
from
interrupts, as the routine is slow and the things can start to
nest and clobber everything.
len
parameter. As the buffer isn't
zeroed, remnants of old data may still be there, and you get crap.
Aside from that, tte_con_write()
is straightforward.
As said, the contents of the loop are nearly identical to the one in
tte_write()
. The only real difference is point 3. This is
a test for VT100 formatting strings, which will be covered in the next
subsection.
To make use of the new writer, you have to hook it into the device list
somehow. First, create a devoptab_t
instance which the
writer in the right place. There is a list of device operations called
devoptab_list
. The devices of interest are the streams
stdout
and stderr
, which are entries
STD_OUT
and STD_ERR
in the list. Simply
point these entries to your own struct.
A second item is to set the buffers for these streams. I'm not sure this
is really necessary, but that's how it's done in libgba and its author
knows this system best so I'm not going to argue here. The function for
this is setvbuf()
. You find the required initialization
steps below.
static int sConInitialized= 0; const devoptab_t tte_dotab_stdout= { "ttecon", 0, NULL, NULL, tte_con_write, NULL, NULL, NULL }; //! Init stdio capabilities for TTE. void tte_init_con() { // attach our operations to stdout and stderr. devoptab_list[STD_OUT] = &tte_dotab_stdout; devoptab_list[STD_ERR] = &tte_dotab_stdout; // Set buffers. setvbuf(stderr, NULL , _IONBF, 0); setvbuf(stdout, NULL , _IONBF, 0); sConInitialized = 1; }
Calling tte_init_con()
activates stdio's functionality
so you can use printf()
and such. Note that the raw
printf()
is rather heavy and it also has floating point
options, which are rarely used in a GBA environment, if ever. For that
reason, you'll usually use its integer-only cousin,
iprintf()
. Also note that TTE's implementation is
different from libgba's, and the two should not be confused.
For that reason, I've hidden the iprintf()
name behind
a tte_printf
macro.
The following is a short example of its use. I'm using
tte_printf()
here, but printf()
or
iprintf()
would have worked just as well.
#include <stdio.h> #include <tonc.h> int main() { REG_DISPCNT= DCNT_MODE0 | DCNT_BG0; // Init BG 0 for text on screen entries. tte_init_se_default(0, BG_CBB(0)|BG_SBB(31)); // Enable TTE's console functionality tte_init_con(); tte_printf("#{P:72,64}"); // Goto (72, 64). tte_printf("Hello World!"); // Print "Hello world!" while(1); return 0; }
As wonderful as printf()
is, there are some downsides
to it too. First, it's a very heavy function that calls quite a large
amount of functions which all have to be linked in. Second, it is
pretty damn slow. Because it can do so much, it has to check for all
these different cases. Also, for the string to decimal conversion it
uses divisions, which is really bad for the GBA.
Be aware of how much printf()
costs. If it turns out to
be a bottle-neck, try making your own slimmed down version. A decent
sprintf()
alternative is posprintf()
,
created by Dan Posluns.
Every book on C will tell you that you can place text on a console screen. What they usually don't tell you is that, in some environments, you can control formatting as well. One such environment is the VT100, which used escape sequences to indicate formatting. The libraries that devkitPro distributes for various systems use these sequences, so it's a good idea to support them as well.
The general format for the codes is this:
CSI n1;n2 ... letter
CSI here is the ASCII code for the command sequence indicator, which in this case is the escape character (27, 0x1B or 033) followed by '['. The letter at the end denotes the kind of formatting code, and n1, n2 … are the formatting parameters. Wikipedia has a nice overview of the standard set here and there's another one at http://local.wasp.uwa.edu.au/~pbourke/dataformats/vt100/. Note that not all of the codes are supported in the devkitPro libraries. The ones you'll encounter most are the following:
ESC[dyA | Move cursor up dy rows. |
ESC[dyB | Move cursor down dy rows. |
ESC[dxC | Move cursor right dx columns. |
ESC[dxD | Move cursor left dx columns. |
ESC[y;xH | Set cursor to column x, row y. |
ESC[2J | Erase screen. |
ESC[nK |
|
ESC[y;xf | As ESC[y;xH |
ESC[s | Save cursor position. |
ESC[u | Restore cursor position. |
If you compare this list to table 22.4, you'll see that most of these codes have corresponding TTE commands. You can use either, but if you plan to make something that's supposed to be cross-platform, use the VT100 codes.
I'm trying to keep my implementation as close to the standard as possible. This is mainly because TTE uses other things just 8x8 characters on a regular background. In particular, scrolling is absent here and there are no color codes. Yet.
You may have heard of a little thing called ASCII. This is (or was; I'm not sure) the standard encoding for character strings. Each character is 1 byte long, giving 256 numbers for letters, numbers et cetera. Fig 22.1 and fig 22.12 contain character 32 to 255, as they usually appear on Windows. ASCII works fine for Western languages but are completely inadequate for languages like Japanese, which have thousands of characters. To remedy this, they came up with Unicode, which has 16 bits per character.
An intermediate between this is UTF-8. This still uses 8-bit characters for the lower 128 ASCII codes, but bytes over 0x80 denote the start of a multi-byte code, where it and a few of the following characters form a single, larger character of up to 21 bits.
UTF-8 is a nice way of having your cake and eating it too: you can still use normal characters for Latin characters, meaning it'll still work with ASCII programs, but you also have a method of representing bigger numbers.
String (binary) | Number (binary) | Range (hex) |
---|---|---|
0zzzzzzz | 0zzzzzzz | 0x000000 - 0x00007F (7 bit) |
110yyyyy 10zzzzzz | 00000yyy yyzzzzzz | 0x000080 - 0x0007FF (11 bit) |
1110xxxx 10yyyyyy 10zzzzzz | xxxxyyyy yyzzzzzz | 0x000800 - 0x00FFFF (16 bit) |
11110www 10xxxxxx 10yyyyyy 10zzzzzz | 000wwwxx xxxxyyyy yyzzzzzz | 0x010000 - 0x10FFFF (21 bit) |
Table 22.6 shows the conversion works. If a byte is lower than 128, it's a simple ASCII character. If it's higher, it can fall into three classes of multi-byte numbers. The range of the byte determines the number of bytes for the whole thing; once you know that, you need to grab the appropriate bit-patterns from these bytes and join them into a single number as the table indicates. For more details, I will refer you to the wikipedia page.
Below you can find a routine that reads and decodes a single utf-8
character from a string. Yes, it's a cluster-f**k of conditions,
but that's necessary to check whether all the characters really
follow the format; and if it doesn't, it'll interpret the first byte
of the range as an extended ASCII character. If you want, you can
omit all the `if((*src>>6)!=2) break;
' statements.
//! Retrieve a single multibyte utf8 character. uint utf8_decode_char(const char *ptr, char **endptr) { uchar *src= (uchar*)ptr; uint ch8, ch32; // Poor man's try-catch. do { ch8= *src; if(ch8 < 0x80) // 7bit { ch32= ch8; } else if(0xC0<=ch8 && ch8<0xE0) // 11bit { ch32 = (*src++&0x1F)<< 6; if((*src>>6)!=2) break; ch32 |= (*src++&0x3F)<< 0; } else if(0xE0<=ch8 && ch8<0xF0) // 16bit { ch32 = (*src++&0x0F)<<12; if((*src>>6)!=2) break; ch32 |= (*src++&0x3F)<< 6; if((*src>>6)!=2) break; ch32 |= (*src++&0x3F)<< 0; } else if(0xF0<=ch8 && ch8<0xF8) // 21bit { ch32 = (*src++&0x0F)<<18; if((*src>>6)!=2) break; ch32 |= (*src++&0x3F)<<12; if((*src>>6)!=2) break; ch32 |= (*src++&0x3F)<< 6; if((*src>>6)!=2) break; ch32 |= (*src++&0x3F)<< 0; } else break; // Proper UTF8 char: set endptr and return if(endptr) *endptr= (char*)src; return ch32; } while(0); // Not really UTF: interpret as single byte. src= (uchar*)ptr; ch32= *src++; if(endptr) *endptr= (char*)src; return ch32; }
Both tte_write()
and tte_write_con()
use utf_decode_char()
when the string requires it. The
larger characters can be used to access larger font sheets. You could
use the larger sheets for better language support, or perhaps to extend
the standard set of characters with arrows, and other types of symbols.
There is, however, one catch to using UTF-8 with stdio. Internally,
stdio is really picky about what's acceptable. For example, the
copywrite symbol, © is extended number 0xA9. In non-UTF-8, you
could use just 0xA9 in a string and it'd use the right symbol.
However, 0xA9 alone wouldn't fit any of the formats from
Table 22.6, so it's an invalid code in
UTF-8. While utf8_decode_char()
is forgiving in this
case, stdio isn't, an will interpret it as a terminator. In other
words, be careful with extended ASCII character; you have to
use the proper UTF-8 formats if you want to use the stdio functions.
As of devkitArm r22, printf()
and the other stdio
functions use the UTF-8 locale. This effectively means that you
cannot use characters like ‘©’ and
‘è’ directly like you used to in older versions.
You need to use the full multi-byte UTF-8 notations.
It's always a good idea to see how fast the things you make are. This is particularly true when the functions are complex, like most of the bitmap and tile renderers are.
Table 22.7 lists the cycles per glyph for the majority of the available renderers. These have been measured with the string (and library code) in ROM with the default waitstates, under -O2 optimization. The font used was verdana 9, with has a cell size of 8x16, meaning it can be used for both fixed and variable widths with ease. The test string was a 194 character line from Portal:
“Please note that we have added a consequence for failure. Any contact with the chamber floor will result in an 'unsatisfactory' mark on your official testing record followed by death. Good luck!”
Renderer | Cycles/char |
---|---|
null | 221 |
se_drawg | 595 |
se_drawg_w8h16 | 370 |
ase_drawg | 773 |
ase_drawg_w8h16 | 458 |
chr4_drawg_b1cts_base | 3049 |
chr4_drawg_b1cts | 2044 |
chr4_drawg_b1cts_fast | 631 |
bmp8_drawg_b1cts_base | 2875 |
bmp8_drawg_b1cts | 2078 |
bmp8_drawg_b1cts_fast | 619 |
bmp16_drawg_b1cts_base | 2456 |
bmp16_drawg_b1cts | 1503 |
obj_drawg | 423 |
First, note the great differences in values: from hundreds for the tilemaps and objects to thousands in the case of bitmaps and tile renderers. And this is per character, so writing large swats of text can lead to significant slowdown.
The null()
renderer is a dummy renderer, used to
find the overhead of the TTE system. 200 isn't actually that bad,
all things considered (remember: ROM code). That said, now
compare this number to the regular tilemap time: the overhead is
takes up a significant fraction of the time here. Also note the
difference between the standard and 8×16 versions of
se_drawg
: this is purely due the loops
Half of the TTE overhead actually comes from the wrapping code; cursor setting and checking can be relatively slow. And I'm not even considering clipping here.
For the bitmap and tile renderers, I've timed three versions. A
‘base’ version, using the template from
chr4_drawg_b1cts_base()
in
section 22.6.2.; C-optimized versions,
which are the default renderers; and a fast asm version.
The bmp16
variants are faster than the others because
you don't have to mask items into the surface. What's interesting,
though, is that the difference between bmp8
and
chr4
is practically zero. This probably has something
to do with the layout of the font itself.
Also note how the base, the normal and the fast versions compare.
chr4_drawg_b1cts()
is 33% faster than the base version,
and chr4_drawg_b1cts_fast
is three times faster still.
And remember, 200 of that 631 is TTE overhead, so it's actually
4.5 times faster. This is not just from the IWRAM benefit: it also
has to do with ARM vs Thumb, and hand-crafted assembly vs compiled
code.
As far as I'm concerned, this chapter is basically the earlier text chapter done right. It's covered all types of graphics:regular/affine tilemaps, 8bpp/16bpp bitmaps, 4bpp tiles and objects. Okay, so I left 8bpp tiles out, but that's an awful mode for tile-rendering anyway. The functions for glyph rendering given here are work for arbitrary sizes, fixed and variable width fonts and should be doing so efficiently as well.
Furthermore, it has presented Tonc's Text Engine, a system for
handling all these different text families with relative ease. After the
initial set-up, the surface-specific aspects are basically dealt with,
making its functionality much more re-usable. I've also covered the most
basic aspects of processing strings for printing: how to translate from
a UTF-8 encoded character to a glyph-index in a font-sheet, and
how you can implement formatting tags to change positions, colors and
fonts dynamically. Lastly, I illustrated how you can build a callback
that the stdio routines can call for output, making
printf()
and its friends available for general use.
This whole chapter has been a showcase for TTE and what it can do. Even though it's not in a fully finished state, I think that it can be a valuable asset for dealing with text. If nothing else, the concepts put forth here should help you design your own glyph renderers or text systems.
Prev | Contents | Next |
More mode 7 | Assembly |