Yes, I found a funny article on a Commodore 64 fan site, claiming to show that the C64 processor (the MOS 6510) had at least something computationally faster than the Z80A of the Zx Spectrum.

That was a false statement. In fact, the 3.5 MHz Z80A of the Sinclair Zx Spectrum kicks hard the 0.98 MHz MOS 6510 in the ass.

That article ("ffd2/fridge/speccy/score") contained, as in the Most Standard Commodorian-Fan Speech, factual inaccuracies, fake assumptions, unproven assertions, stupid slogans, unsubastantiated statements...

Below, I will quote everything he says, adding my comments. Since we are showing Z80-class and 6502-class features, "Z80" and "Z80A" will be considered synonyms, and also "6502" and "6510".


The background:

The Spectrum consists of a Z80 running at 3.54 MHz, a 256x192 bitmapped
display, a simple sound chip, and 48k.  The C64 consists of a 6510
running at 1MHz, a sophisticated sound chip, 64k, and a sophisticated
video chip, offering 16-colors, text modes, custom character modes,
320x200 bitmap modes, 160x200 multicolor bitmap modes, sprites,
raster interrupts, blah blah blah and so on.

First, let me say that European version of the Commodore 64 was less than 1 MHz. It was 0.98 MHz, yes, zero-dot-nine-eight, yes, 982 kHz. Except the Commodore 64, I've never seen a computer running at less than one megahertz. The Spectrum Z80A "runs" like a cheetah at 3.5MHz while the Commodore 64 "runs" --oops, no, "walks" like a turtle at 0.98MHz. Yes, megahertz matters.

Then, let me say that if you want to compare raw processor speed in performing particular tasks, you should save us your "sophisticated chips" sermon (a Commodorian fan standard): these comparisons are performed on "bare microprocessor power", between the 3.5 MHz Zilog Z80A of the Zx Spectrum and the 0.982 MHz MOS 6502 of the Commodore 64.

Note: I will assume the Zx Spectrum running at 3.5 MHz, not 3.54 (the "3.54" thing seems to be Spectrum 128 related).

While it is clear that the 64 will outperform the Spectrum on tasks
related to text, sprites, etc. it was not clear what sort of performance
ratio existed for straight calculations and applications involving
bitmapped graphics: for example, 3D graphics programs.
Again the C64 Sermon, how boring. But -well- let's be more precise.

"Text": while in text-mode, the Commodore 64 needs to write one byte to draw a character, and another byte to set up its color. That is, two bytes, plus time required to calculate their respective addresses. The Zx Spectrum does not have a text mode.

Zx Spectrum only has a graphics mode (a full bitmap always available at a fixed address, not requiring any initialization/setup). It requires eight bytes to be written into display memory to form an 8×8 character cell, plus an attribute byte to set up its color. That is, nine bytes, plus time required to calculate cell address and attribute address (due to its peculiar screen mapping, going through the 8 scanlines to write a character requires only a simple 8-bit register increment, which is almost as fast as a "NOP" instruction).

Thus, the C64 "outperforms" only when in text-mode compared to graphics bitmap mode. That is, "outperforms" only when you compare bananas to apples...


Sprites: yes, you "outperform" on sprites, yes, Zx Spectrum does not have sprites. Commodore 64 is a "professional computer" (as of its label below the keyboard) which has "sprites" (the standard videogames feature). "Professional" with "sprites", something like putting Diet Coke into your precious Brunello di Montalcino wine. Great. I always told you that Commodorian fan sermons make me either laugh or feel bored.

"Etc"... er... what's "etc"? Where are the other "outperformings"?

A number of people claimed that the Spectrum would significantly outperform
the C64 on general calculations and 3D graphics programs; these claims
were based primarily on the 3.54:1 MHz ratio and personal bias, but
were certainly claims worth investigating.
Investigations are welcome, but the 3.5MHz to 982kHz ratio is significant unless you definitely prove that you are not comparing meles with bananas. No: you are comparing two processors doing the same things ("fetch a byte, add two registers, store a byte..."). It's not a 5-10%, it's three times and a half (3.564 more precise figure); if you try to erase this fact, then the "personal bias" you're blaming is only yours!

I began looking for insight into three main questions:

1. How do the Z80 and 6502 differ, and what is the Z80 approach to solving
problems/how does it differ from the 6502 approach?

2. How would the Spectrum perform on the kinds of programs that I write?

3. Is there any merit to the earlier claims about the clear superiority
of the Spectrum a) in general and b) specifically for 3D graphics?
Hmmm... the kinds of programs that "you" write?

Some extra small questions: do you write Z80 assembler software as well? Did you ever use the second set of Z80 registers? Do you know its "repeating" instructions like "LDIR", "CPDR", "INIR" and so on? Did you ever use their non-repeating equivalent ("LDI", "CPI", "OTI" and so on) to get multiple things in one instruction? Did you ever test the so-called "undocumented" Z80 instructions? (in spite of their name, they are largely used and known because often help to save time while programming and execution size and speed).

Looking at the results below, I guess not. It seems that, except some notable cases (like the "LDIR" of note 1) you are using the Z80A as if it was an 8080 or a 6502...

Some technical discussion eventually ensued, and a number of code snippets
were compared. Cycles were counted, opinions were declared, but
through it all a few things became apparent, and new insights were
gained.
Alas, we don't find "where" you counted the cycles. In the article you present a list of figures without any line of code, plus a dead link to a non-existing web site. And "new insights" seems just another name for "Commodorian bias".

General conclusions:

1. The people who were loudly extolling the virtues of the Spectrum over
the 64 not only did not understand the broad issues (i.e. have any
practical 6502 or C64 programming experience to compare with), they did
not even understand the specific issues (e.g. practical experience
in doing 3D graphics, or drawing lines).
This statement applies to Commodorian people as well. There are not much people out there programming on both Zx Spectrum and Commodore 64.

But, as we will see later, the statement applies only to Commodorian fans.

2. The typical cycle ratios are around 3:1.

Seven programs have been considered: slow multiply, block mem transfer,
substring search, three line routines, and the fast multiply. The slow
multiply runs at 2:1. The non-LDIR memory copy runs at 3:1. The
substring search typically runs at 3:1. The line routine runs at
3:1, with unrolling bringing it down to 2.7:1. In practical use
(e.g. a matrix multiply) the fast multiply runs at > 3:1.
You should say what you mean for "runs at 2:1". Is this a cycle count? Then the times must be multiplied for the clock ratio, because Zx Spectrum has 3.5 million "clockcycles" in a second, while the Commodore 64 has 981 thousand "clockcycles" every second.

Or, was it a real figure? Does it mean that Zx Spectrum "runs at 2:1" (two times faster than Commodore 64)? I guess this is exact. But... why did you criticize people "extolling the virtues of the Spectrum"? Why did you forget the "virtues" that you were going to demonstrate? Another Commodorian Blah-Blah-Blah, eh?

From this I conclude that typical cycle ratios will be 3:1, in particular
for the kinds of programs I write. Obviously, some algorithms will
do better, and others will do worse.
Then let's see where "better" and where "worse", and let's actually verify your "3:1 ratio" assertion...

Conclusions to my specific questions:

1. The Z80 is based around its registers. Algorithms which fit entirely
within the registers do very well, especially for 16-bit applications.
Memory access is generally done indirectly via (HL), which tends to
favor sequential memory access. The stack pointer is 16-bits and may
be used for useful things. Branching and jumping are generally
slow, as are direct memory address and absolute numbers, and indexing
is fairly dissimilar. The Z80 has a number of specialized instructions
which are used in a variety of tasks, and lacks the range of addressing
modes offered by the 65xx.
First lie: "branching and jumping is slow". Let's evaluate "BNE" (branch when not equal) on 6502, and the Z80 equivalent "JP NZ" (branch when not equal). BNE requires 2 clockcycles if not jumping, or 3 clockcycles when jumping; JP NZ requires 4 clockcycles if not jumping, or 10 clockcycles if jumping. How do compare these figures? The Spectrum is faster than the C64. Demonstration: Oh, and I did not mention the performance hit when the "BNE" jump crosses a 256-byte page boundary, requiring an extra clock cycle (that is: deteriorating 33% to 50% its poor performance): the "BNE" works well only inside a 256-bytes page! Commodorian programmers will have to "encapsulate", as far as possible, their assembler programs in "chunks" of 256 bytes!

Second lie: "direct memory access is slow". Let's evaluate "INC $nnnn" (increment byte at location nnnn, "direct memory access" of the 6502) with the two Z80 instructions needed to simulate it, that is "LD HL,$nnnn ; INC (HL)": the 6502 "direct" requires only 6 clockcycles, while the Z80's "LD HL,$nnnn" requires 10 clockcycles and the subsequent "INC (HL)" (increment the byte pointed by HL) requires 11 clockcycles, for a total of 21 clockcycles of the Z80: But the first "LD HL" can sometimes be saved (it's only an initialization); if you happen to need to increment two times the same variable, you have:
Commodorian bias: "Z80 lacks addressing modes". But, as shown in the "second lie" case, the Z80 can simulate missing modes and still win the match.

Useful hint: in the famous Z80 "undocumented" opcodes there is quite a number of operations (like "rotating a memory value", better than the 6510) which will save code size and execution time. I think Commodorians already used all of their "powerful" "undocumented" codes as well.

The 65xx is based around fast access to memory, in particular zero
page, and its index registers. Algorithms which involve scattered
memory accesses do very well, as do programs which make heavy
use of branching and subroutines. The ability to add, compare, etc.
directly from memory (ADC $C002 : CMP $D020) means that algorithms
involving large amounts of variables, tables, pointers, etc. will
perform much better on the 65xx. Absolute operations (ADC #$21)
are significantly faster (2 cycles on 6510 vs. 7 on Z80). Algorithms
involving relatively few variables bog down in comparison.
What does "fast access" mean? Perhaps, did manufacturers like Zilog build their processor in order to have a "slow access" only? Let's wipe out the Commodorian Bias and let's go on.

Commodorian Bias: "zero page" is the first 256-bytes page. Yes, 256 bytes only. And yes, there are registers and stack and other "untouchable" (or "beware!") data. How many variables and buffers can you store there? The main advantage of "page zero" instructions is not speed, but a memory saving: a "zero page" access saves one byte of code because the address size is only 8 (instead of 16) bits. "Zero page" faster access will save a few clock cycles - while making Commodorian assembler programmers to all-night fights against that ridicolously small area where those single-cycle-and-single-byte-save instructions can be performed. How about a 30+ kilobytes assembly code having to always to rely on a 256-byte page?

Bias Again: "scattered memory accesses do very well". It seems that Zx Spectrum programmers work on clean, ordered software source, while Commodorians work only dirty, scattered, messy, spaghetti-like software source...! Planning a software implies choosing "scattered" and "non-scattered" models. And having a bunch of registers (like those of Z80, absent in 6502) means saving lots of intermediate operations in both cases.

Third lie: "heavy use of branching and subroutines". Where do they perform better? We've already seen the "JP NZ" example, let's see the "RET" example now. The "return from sub-routine" instruction of the 6502 is called "RTS" and requires 6 clockcycles on the 6502, while the Z80 equivalent is called "RET" and requires 10 clockcycles:
Fourth lie: "absolute operations... significantly faster". Huh? What does it mean? Let's see an example. Adding the immediate value $21 to the accumulator: "ADC #$21" on the 6502 and "ADD A,$21" on the Z80. On the 6502 it requires 2 clockcycles, on the Z80 it requires 7 clockcycles. But when we do a little calculation, we see that: Well... ehm... uh-oh... When Commodore 64 fans say their machine is "significantly faster", then it actually means that the Zx Spectrum outperforms the C64. And yes, all of my friends "knew" that their C64 was "powerful" and "fast"... har! har! har!

Commodorian Bias: "large amount of variables... bog down...": these claims do not prove anything. To gain some -and disputable- advantage, you "should" have a large amount of messy scattered data, try always to access all of them at once, and work out in a manner that saving data to registers will not help you. Hey, it seems the Art of Heavily Acrobatic Programming! And it also underestimates the Z80 registers, which purpose is not the trivial "store them some data to save some memory access".

Thus, Z80 algorithms try to fit all the variables into registers,
reduce memory access to sequential or page-aligned accesses (so that
HL may be used), try to avoid branching/decision making, and try to
use specialized instructions like DJNZ.
6510 algorithms make heavy use of the index registers, zero-page
and absolute operations, don't mind lots of branching/decision making,
and try to avoid too many operations on variables.
Commodorian Bias again and again (but much of it has been already debunked above). Well, the above paragraph could be called "Commodorians do not know how to write software". They think that register allocation is done once per software, and never changes at runtime. They think that branching and decision making is faster on the "less than one megahertz" processor, even after comparing actual timings. They think they can "scatter" messy variables everywhere and operate on them anytime. They think processor registers are useless. They think, they think, they think, they think... they stink.

2. For things like 3D graphics, the Spectrum probably has a 10%-20%
speed advantage. While nontrivial, it is by no means decisive. For
programs involving any text, sprites, etc. the Spectrum will clearly
suffer.
Wonderful. After a bunch of false or biased statements, the Commodorian Guy still shows numbers. This time he admits that the Zx Spectrum "probably" (waddayomean: "probably?") has a 10-20% speed advantage (is it "10%" or is it "20%"?); better, a "nontrivial" advantage of the Zx Spectrum.
Good. This is one of the unique times we see a Commodore fan admitting that the
Zx Spectrum is "nontrivially" better than the Commodore 64. But even admitting a "probably 20%" better speed, the Commodorian guy immediately says "by no means decisive". Amazing. I couldn't ask more to demonstrate better the Typical Commodorian Bias.

You now understand what Commodore fans are. "6502 is better, 6502 is faster, 6502 allows memory access, 6502... 6502... yadda-yadda-yadda... the Zx Spectrum is probably nontrivially faster, but by no means decisive... but we Commodorians have sprites, and Sinclairists don't have sprites! Yay! We're better! You Spectrumists suffer! Yay!"

3. You gotta be kidding. As if. :)


Finally, the numbers. All code may be found at
http://stratus.esam.nwu.edu/~judd/fridge/
Dead link.

Shootout at the 0K Corral
-------------------------

(Cycles)
Z80/Spectrum C64 ratio
------------ --- -----
8x8->16 shift&add multiply 357/385 160/216/248 1.6-2.2
This is indeed the interesting part. They claim to have programmed a number of routines in both Z80A assembler (Spectrum column) and 6510 assembler (C64 column) to see the "ratio" of their respective clock cycles count. We already saw that Spectrum processor runs on a 3.5MHz processor, that is 3.56 times faster than the Commodore one.

That is, a routine requiring 3500 clock cycles on the Zx Spectrum will complete in one millisecond; a routine requiring only 982 clock cycles on the Commodore 64 will complete in the same time - one millisecond (yes, megahertz difference matters. To get "head-to-head", the "ratio" should be 3.56).


The "ratio" figure thus says that the Zx Spectrum is either "faster" (when ratio is less than 3.56) or "much faster" (when ratio is "much less" than 3.56; for example, if ratio is 1.78 then it means that the Spectrum is twice as fast than the Commodore 64.

Block mem copy 39*x 13 3
21*x [1] 1.62

Substring search [2] [3] init: 29 4
successful compare: 57 19 3
advance next substr: 46+21*x 11+9*x [4]
advance and loop: 61 15 4.07
compare last char: 40 18 2.22

Line routine [5] 73/111 [6] 30/33 2.92 [7]
29/37 [8]
24/72 (21/68) 6/30 (5/28) 2.7 (2.7)
49/71 21/25 2.6

Fast multiply [9] 100 (+7/3/27) 43/25 [10] 2.3-2.6/4.2-4.4

Sprites
String print None offered [11]
This is kinda weird to explain. Currently we miss the assembly source for the Z80A routines; the Commodorian guy who presented the above table did surely well on the 6510 side, but... did he wrote optimal Z80 code as well? Or did he rely on some sub-optimal code? (like the one in the teaching books). I think he didn't write optimal code: in the first test he shows a "block mem copy" figure of "39*x", and then says that the "LDIR" version (the most common operation on large block copying on Z80 processors) requires only "21*x". Thus the "39*x" figure comes from a largely sub-optimal example that has been placed there only to show some figure near the 3.56 "head-to-head" value...

The C64 has better "ratio" values only in two cases (fast multiply and a weird-named "advance and loop" with a funny clockcycle count; more on later).

In all other cases, Spectrum shows the expected figures: 2.2-2.3 means that the Z80A of the Spectrum performs at 155%-165% the Commodore 64 "fast" speed; the almost missing "LDIR" test performs at 225% the C64 "fast memory access" speed...

Notes:
1. Using LDIR on Z80.

2. Given a list of null-terminated strings, find a particular string.
The substring search involves four main processes: successful character
compare, compare of last character, advance to next substring on mismatch,
and advance pointer and loop for next string. Init refers to initial
setup (trivial). "Advance to next substring" includes unsuccessful
character compare. "Advance and loop" counts cycles up to normal compare
loop.

3. The C64 version is 28 bytes. A 64 program would change this problem slightly
to improve performance (strings terminated with inverted dextral
character instead of null, etc.)

4. x=number of characters advanced. Ratios are 4.2, 3.35, 3.03, 2.87, 2.77,
2.7, 2.65, 2.61 for x=0,1,2,...
As expected, the Commodorian guy writing that "score" report appends some acrobatic wording trick to make the C64 appear "faster". The "inverted dextral" is also possible on the Z80A: on the Zx Spectrum basic ROM you actually find it used for both keyword list and error text messages.

I guess that the Spectrum string routines timed by the Commodorian guy did not want to use the "CPIR" Z80 instruction (compare, increment and repeat), analogue to the "LDIR" (load, increment and repeat) above, and surely did not want to use also the second set of Z80 registers. Above we saw a "ratio" of 1.6 when using "LDIR" instead of a wordy lenghty redundant routine (note: the latter also kicked in the ass the Commodore 64), so we can expect a large time improvement also in the strings routines figures of the Z80 by only using "CPI", "CPIR" and the extra registers.

5. Three separate Spectrum line routines were offered. All comparisons
are of equivalent routines/algorithms. The three routines are:
- slope<1, looped
- slope<1, unrolled across x-pixels, counting by columns
(more optimized version of the above)
- slope>1, unrolled across x-pixels
Spectrum times are Alvin's strange "average" cycles.
We wonder what are those "Alvin's strange average cycles". Why? What does it mean?

Drawing a graphics line on the screen using a Z80 does not require memory variables: that is, all data - deltas, pixel address, and so on - is contained in the Z80 registers (and almost one set remains free for other usage). No stack usage, no memory accesses except writing pixels. In contrast, 6510 has to use lots of memory variables. Did "Alvin" at least imitate the "DRAW" routine contained in the Zx Spectrum Basic ROM?

Another question: the Spectrum display map is not linear because of some optimization to draw text characters (remember, Zx Spectrum does not have the ugly "text without graphics" mode of the C64). I wonder how many times a screen-address recalc has been performed by someone unfamiliar with the Zx Spectrum display mapping. I guess that someone (especially Commodorians) may fall in that pitfall...

6. Using Ian Collier's revised algorithm.

7. "Average" cycle times consider one step in the x-direction followed
by one step in x and y.

8. Looped version, slope>1 (no Spectrum version offered).

9. Spectrum times are modified by +7 if a-b<0, +3 if a+b>255, and +27
if placed in a subroutine (17 for CALL, 10 for RET). Cycle ratios
assume inlined routine.

10. For multiplication of constant*vector (i.e. matrix multiply, or projection)
C64 version is 43 cycles for first multiply and 25 cycles for successive
multiplies. Thus, ratios are around 3.1 and 3.4 for two and three
successive multiplies.

11. Spectrum will choke, badly.
The considerations in the note 9 and 10 seem to come from some bad programmer who did not know Z80 flags, mixing signed/unsigned data, index register usage, 16-bit pointer arithmetic, etc. Again, without access to the original code, we cannot make assumptions, except that the Spectrum routines could be a real mess because of the funny comments of the section above).

The consideration in the note 11 shows the Commodorians' Fundamental Approach to Evidence Reporting.

Again (again and again) we see that Commodorian fans:

We also demonstrated what Commodorians do not want even want to hear: