    About twenty years ago a comics magazine presented a little game called "100 game" (at least here in Italy). In a 10×10 matrix the player had to fill the single cells with a number, incremented at every step, starting from the first cell (top-left) and moving around in the eight possile directions (horizontal or vertical by skipping two cells, or diagonal by skipping one cell), provided that the destination cell is not already filled.

A solution is found when the player can fill the entire matrix without getting to a "dead way". After some months, I got it (by paper and pen and a lot of patience), while wondering "If I've had a computational machine...". At that time, I wasn't aware of the P/NP (non-polynomial-completeness) question, neither I had decent knowledge about exponential-growth order algorythms. My first computer, in 1984, was a Sinclair ZX Spectrum, featuring a Z80A 8 bit processor at a stunning 3.5 MHz clock speed and 48k RAM.

 Beh, sciroppatevela in inglese: m'shfàstaréo a tradurla pure in italiano...
Now I have a degree in Computer Science, some knowledge of P/NP, master a bunch of programming languages, hardware, etc; I have also a roaring P3/750 notebook with 192Mb RAM (well, I later switched to an Apple Powerbook G4/1.67 with 1Gb RAM), but I got back to that Very Young Boy Dream to get a greedy algorythm solution for it.

The first attempt was the listing 1 below, almost entirely resembling my first Spectrum Basic attempt (that never run because was never completed: I didn't have recursion, there!) in late 1984.

Well, my friend Giosi got a 100×100 solution using a nice trick and Java...!

The algorythm reads as "for every possible move, see if you can continue from there. If yes, then try from there the same way. If the program returns from there, it means that it wasn't a solution". So, it's an O(exp(7,n²)) (and -ouch!- the proposed comics game assumed n=10), because every node of the tree has seven children (the eight possible directions except the one where you came).

The tree is not complete but the average case is quite near to worst case. Finding a solution means finding one path from the root to a leaf (one of the "solution" leaves) at depth , no matter where you go - this means also that the best case is shown only if you don't apply the same strategy while descending the tree: the solution for a non-polynomial algorythm has to be searched in an adaptive strategy while selecting "which node I will descend now": using a fixed strategy you will descend a lot of useless leaves (not at depth ). A good case is when you find a path that quickly resolves in a "no way", but with little vales of n this gets really rare.

Well, a solution in reasonable time (some minutes) for n<=10 can be found even with a greedy approach.

//   cento-standard.c   http://www.retepnet.it/alfmar/   Feb'2002

#include <stdio.h>

#ifndef SIZE
#define SIZE 10
#endif

int m[SIZE][SIZE], ctr=0, maxz=0;

#define bound(a,b)    ((a)>=0 && (a)<SIZE && (b)>=0 && (b)<SIZE)

void show()
{
int a,b;
printf("\033[;H\n");
for(a=0; a<SIZE; a++)
{
for(b=0; b<SIZE; b++)
if(m[a][b])
printf("%-3d", m[a][b]);
else
printf("   ");
printf("\n");
}
printf("\nmax N = %d  --  recursions: %d\n", maxz, ctr);
}

int vai(int x, int y, int z)
{
ctr++;
if(!bound(x,y)) return 1;
if(m[x][y]) return 1;
if(maxz<z) maxz=z;

m[x][y]=z;
show();

if(++z>SIZE*SIZE)
{
printf("\n\nOK!\n\n");
exit(0);
}

if(vai(x+3, y, z) && vai(x+2, y+2, z) && vai(x, y+3, z) && vai(x-2, y+2, z)
&& vai(x-3, y, z) && vai(x-2, y+2, z) && vai(x, y-3, z) && vai(x+2, y-2, z))
{
m[x][y]=0;
}

return 1; // ouch!
}

int main(int argc, char **argv)
{
printf("\033[2J");
if(vai(0,0,1)) printf("impossible\n");
return 1;
}

This kind of approach gives about 200 thousands recursion calls per second on my P3/750 machine, compiled with gcc 2.96 with standard optimizations ("-O2"). It finds quickly (a matter of seconds) the solution for n=5 and n=6 (for n<5 a solution does not exist). But, when you try with a SIZE of 7 or above, you will see countless recursions and countless hours of work...

How can be optimized it, without changing the irrational greedy approach with fixed strategy?

Well... oops... er... YES, use a bit of assembly programming for the first time in more than eight years of Linux programming! =:-)

Some hints:

• don't show the entire matrix at every step, but at fixed times; e.g., every two or three seconds, using SIGALARMs;

• allow different strategies, not only the "clockwise" strategy;

• the only function that I need to code in assembler is the recursive function; the show-to-screen procedure can be in C;

• don't repeat the time-consuming bound checks: if you add 4+4 fake rows and 4+4 fake columns containing "illegal" values, the program won't need anymore to check if matrix indices are legal or not;

• don't use integer (4-bytes) values for a cell, but a single signed byte (so that memory accesses are cut down by 75%);

• don't use two matrix indices, but a pointer to a fixed-size matrix, so that "moving up three rows" means "subtract from the pointer three times the size of a row";

• don't push on stack pointer parameters, since you can adjust the pointer at the end of the parsing of the seven new possible directions;

• use only register values (don't access memory except for filling a cell or reading its value), avoiding even immediate arguments to instructions;

• page-align critical labels, save information by recycling flag values, etc.

Well, all these lead to a stunning speed of over 83 millions of recursions per seconds on my 750 MHz machine (yes, nine clock cycles per recursion call!!): the 10×10 matrix is completed in less than 150 seconds of CPU usage (while running X/Free, Linux 2.4.17, etc). I think one can do even better if knows the internals of Pentium-III cycle optimizations, branch prediction, etc, but the assembler listing below shows how assembly reengineering of a critical function can boost speeds of two orders of magnitude.

First, the C listing that prepares the matrix and does the output:

//
//      cento.c                  http://www.retepnet.it/alfmar/ -- Feb'2002
//
//      greedy approach to get a recursive solution for the Little "100" Game
//
//      compile with paranoid optimizations:
//          gcc -O6 -Wall -fomit-frame-pointer cento.c centro.o -o cento
//          strip -s -R .comment -R .note cento
//

// --- configuration ---

#define  SECS          3    // timeout for refreshing screen table

// --- library ---

#include <stdio.h>     // printf
#include <signal.h>    // signal
#include <unistd.h>    // alarm
#include <stdlib.h>    // exit
#include <string.h>    // strcpy

extern int do_calc(char *matrix_ptr, int starting_number, int maximum_value);

// --- local variables ---

char m;       // the full matrix (for up to 120x120)

char output[2+120*120];    // pre-formatted values

int n=10;               // matrix order

// --- the show applet ---

void showmatrix()
{                              //  non necesse optimizare hic
int a,b, amax=n, bmax=n;
printf("\033[;H\n");         //  reset cursor position

if(amax>24) amax=24;         //  because of terminal window limits
if(bmax>24) bmax=24;

for(a=0; a<amax; a++)
{
for(b=0; b<bmax; b++) printf("%s", output[1+m[a+4][b+4]]);
printf("\n");
}
}

void show(int sig)             //  shows the current table to screen
{
showmatrix();
signal(SIGALRM, show);
alarm(SECS);
}

// --- initialization/startup ---

int main(int argc, char **argv)
{
int a, b;

if(argc>1) sscanf(argv, "%d", &n);  // presume n<=120

printf("\033[2J");  // clear terminal screen

for(a=0; a<4; a++)
for(b=0; b<128; b++)
m[a][b]=-1;     // keep out some rows, to save a bound-checking

for(a=n+4; a<128; a++)
for(b=0; b<128; b++)
m[a][b]=-1;     // save another bound-checking (rows below)

for(a=4; a<n+4; a++)
{
for(b=0; b<4; b++)
m[a][b]=-1;     // columns to the left
for(b=n+4; b<128; b++)
m[a][b]=-1;     // columns to the right
}

strcpy(output, "***");           // precompile strings
strcpy(output, "__ ");
for(a=1; a<=n*n; a++) sprintf(output[a+1], n>10 ? "%-3x" : "%-3d", a);

show(SIGALRM);                      // simulates the first ALARM/show

a=do_calc(&m, 1, n*n);        // go!

alarm(0);                           // stop screen update...
showmatrix();                       // ...but be sure the solution is shown!

printf("\n[%s]\n", a ? "impossible" : "OK!!!");
return a;
}

And now, the assembler listing of the recursive call.

;
;       centro.asm                         http://www.retepnet.it/alfmar
;
;       assembler implementation of the Little 100 Game recursive calculus
;
;       compile with:  nasm -felf centro.asm
;       then link the centro.o object file to the calling C program
;

; --- strategies (select one) ---

%define  STRATEGY  REGULAR

%define  CLOCKWISE          movearound  E, SE, S, SW, W, NW, N, NE
%define  CLOCK_UNWISE       movearound  S, SE, E, NE, N, NW, W, SW
%define  HOLDYOURBACK       movearound  NW, W, N, SW, NE, S, SE, E
%define  PINGPONG           movearound  W, E, N, S, SW, NE, SE, NW
%define  REGULAR            movearound  W, N, S, E, NW, SW, SE, NE
%define  NONREGULAR         movearound  NW, SW, SE, NE, W, N, S, E

; --- definitions ---

%define  N   (-384)
%define  S   (384)
%define  E   (3)
%define  W   (-3)
%define  NE  (-254)
%define  NW  (-258)
%define  SE  (258)
%define  SW  (254)

%macro go 2
call calc
%endm

%macro movearound 8
go  0, %1
go %1, %2
go %2, %3
go %3, %4
go %4, %5
go %5, %6
go %6, %7
go %7, %8
add ebx, dword (-%8)      ; this is to adjust original ebx value
%endm

; --- the function called only once (to start the recursion) ---

global do_calc  ; this is the function name

do_calc ; int do_calc(char *matrix_ptr, int starting_number, int max_number)
enter 0,0
push ebx
push ecx
push edx
push esi
mov esi, esp
xor eax, eax
mov ebx, [ebp+8]        ; matrix_ptr
mov ecx, [ebp+12]       ; starting_number
mov edx, [ebp+16]       ; max_number
call calc
ret_here    setnz al                ; return value is zero only if the entire
movzx eax, al           ;   matrix was filled
mov esp, esi
pop esi
pop edx
pop ecx
pop ebx
leave
ret
align 16       ; some paranoia alignment...

; --- the recursive function ---
;
; we get here with:
;    ah  -- zero
;    ebx -- points to the address of the next cell to be filled
;    ecx -- next number to place, always >= 1
;    edx -- the last value to place in the matrix

calc:   mov al, [ebx]           ; the recursion begins exactly here!
or al, al
jz able                 ; test cell and return with nz flag if we
ret                     ;   cannot continue from that point
align 16

able:   mov [ebx], cl           ; so, try to fill next cell
inc ecx
cmp ecx, edx            ; check if we really did it
jbe fill
xor eax, eax            ; go out if filled, leaving z flag by xoring
jmp short ret_here
align 16

fill:   STRATEGY                ; apply currently defined strategy

mov [ebx], ah           ; if we get here then it was a closed way, so
dec ecx                 ;   free the current cell and go back one move
ret                     ;     and return with nz because always ecx!=1

; ---

That's all, folks! I already hear lots of Programming Purists crying for my bad-bad-BAD-BAD greedy approach, but I got what I was searching for. Maybe one day I'll rework on it to get a solution of up to 120×120 in reasonable time (the 128×128 matrix was born because of this), but my intent was only to fiddle with assembly in Linux.

YES, if you try to get a 9×9 solution (theoretically simpler), you'll have to wait three hours. So, the 10×10 was not the worst case, but I'm happy I found it in less than three minutes. Here is it for you (notice the horizontal/vertical moves preferred to diagonal moves):

 1 48 29 8 47 30 9 46 33 16 78 69 94 79 70 95 86 71 96 87 36 19 64 35 18 63 34 17 62 45 2 49 28 7 54 31 10 55 32 15 77 68 93 80 65 92 85 72 97 88 37 20 57 40 23 56 41 24 61 44 3 50 27 6 53 26 11 91 84 14 76 67 100 81 66 99 82 73 98 89 38 21 58 39 22 59 42 25 60 43 4 51 75 5 52 74 12 90 83 13

Last note: my friend Giosuè was working on the same problem, but using Java instead of C and assembler. 