Amiga-Development

Please login or register.

Login with username, password and session length
Advanced search  

News:

Created for developers of all Amiga camps

Pages: 1 [2] 3 4

Author Topic: 68k Assembler help  (Read 20044 times)

0 Members and 1 Guest are viewing this topic.

Matt Hey

  • Sr. Member
  • ****
  • Posts: 293
    • View Profile
Re: 68k Assembler help
« Reply #15 on: April 08, 2013, 09:29:02 AM »

Matt,

Converting Frank's code to accept the single parameter in a0 was all that it took, that did the trick  8)

Excellent!

I don't think Frank wrote his assembler targeting 060 specifically, I'm pretty sure his Quake port was aimed at 040/060.  My Quake 2 port will never be able to run on an 040 so it's safe to specifically target an 060.

Scheduling instructions for the 68060 generally doesn't affect performance on the 68020-68040 (integer and FPU). It's possible to write FPU code that is good for the 68040 and 68060. The 68040 does not have the important FINT and FINTRZ instructions making a separate recompile a good idea though.

BTW, in Frank's code do you think both of these should be defined at the same time?

QDIV                    =       1
NICE_DIV                =       1

They are both defined in the original d_scan68k.s. It looks like any values are possible although if QDIV==0 then the NICE_DIV value doesn't matter. It should be faster on the 68060 to set NICE_DIV=0 although I don't know how this will affect the accuracy. The fastest on the 68060 should be QDIV=1 and NICE_DIV=0.
Logged

NovaCoder

  • Full Member
  • ***
  • Posts: 139
    • View Profile
Re: 68k Assembler help
« Reply #16 on: April 08, 2013, 11:54:20 AM »

Matt,

Just compared them on my 1200, full-screen 320x240

Frank's (version you compiled and attached): 8.7 FPS
Non-Frank version: 8.9 FPS

I could try to compile Frank's version with this (QDIV=1 and NICE_DIV=0) and see if that helps.

Otherwise any possible speed improvements you can make to the 'non-Frank' version would be appreciated  :)
Logged

NovaCoder

  • Full Member
  • ***
  • Posts: 139
    • View Profile
Re: 68k Assembler help
« Reply #17 on: April 08, 2013, 02:47:22 PM »

Ok, tried Frank's with QDIV=1 and NICE_DIV=0 but that didn't seem to make it any quicker, so I guess I'll stick with the other asm for Quake 2.  The 'other' asm was written by John Selck who I think may have once ported Duke Nukem to 68k.

Apart from improving speed, I also need a less precise version (only used for distant objects where 16-step precision isn't needed), this is how the function is written in C.

Code: [Select]
void D_DrawSpans32(espan_t *pspan)
{
int count, spancount;
unsigned char *pbase, *pdest;
fixed16_t s, t, snext, tnext, sstep, tstep;
float sdivz, tdivz, zi, z, du, dv, spancountminus1;
float sdivz8stepu, tdivz8stepu, zi8stepu;

sstep = 0; // keep compiler happy
tstep = 0; // ditto

pbase = (unsigned char *)cacheblock;

sdivz8stepu = d_sdivzstepu * 32;
tdivz8stepu = d_tdivzstepu * 32;
zi8stepu = d_zistepu * 32;

do
{
pdest = (unsigned char *)((byte *)d_viewbuffer +
(r_screenwidth * pspan->v) + pspan->u);

count = pspan->count;

    // calculate the initial s/z, t/z, 1/z, s, and t and clamp
du = (float)pspan->u;
dv = (float)pspan->v;

sdivz = d_sdivzorigin + dv*d_sdivzstepv + du*d_sdivzstepu;
tdivz = d_tdivzorigin + dv*d_tdivzstepv + du*d_tdivzstepu;
zi = d_ziorigin + dv*d_zistepv + du*d_zistepu;
z = (float)0x10000 / zi; // prescale to 16.16 fixed-point

s = (int)(sdivz * z) + sadjust;
if (s > bbextents)
s = bbextents;
else if (s < 0)
s = 0;

t = (int)(tdivz * z) + tadjust;
if (t > bbextentt)
t = bbextentt;
else if (t < 0)
t = 0;

do
{
    // calculate s and t at the far end of the span
if (count >= 32)
spancount = 32;
else
spancount = count;

count -= spancount;

if (count)
{
                // calculate s/z, t/z, zi->fixed s and t at far end of span,
                // calculate s and t steps across span by shifting
sdivz += sdivz8stepu;
tdivz += tdivz8stepu;
zi += zi8stepu;
z = (float)0x10000 / zi; // prescale to 16.16 fixed-point

snext = (int)(sdivz * z) + sadjust;
if (snext > bbextents)
snext = bbextents;
else if (snext < 32)
snext = 32; // prevent round-off error on <0 steps from
//  from causing overstepping & running off the
//  edge of the texture

tnext = (int)(tdivz * z) + tadjust;
if (tnext > bbextentt)
tnext = bbextentt;
else if (tnext < 32)
tnext = 32; // guard against round-off error on <0 steps

sstep = (snext - s) >> 5;
tstep = (tnext - t) >> 5;
}
else
{
                // calculate s/z, t/z, zi->fixed s and t at last pixel in span (so
                // can't step off polygon), clamp, calculate s and t steps across
                // span by division, biasing steps low so we don't run off the
                // texture
spancountminus1 = (float)(spancount - 1);
sdivz += d_sdivzstepu * spancountminus1;
tdivz += d_tdivzstepu * spancountminus1;
zi += d_zistepu * spancountminus1;
z = (float)0x10000 / zi; // prescale to 16.16 fixed-point
snext = (int)(sdivz * z) + sadjust;
if (snext > bbextents)
snext = bbextents;
else if (snext < 32)
snext = 32; // prevent round-off error on <0 steps from
//  from causing overstepping & running off the
//  edge of the texture

tnext = (int)(tdivz * z) + tadjust;
if (tnext > bbextentt)
tnext = bbextentt;
else if (tnext < 32)
tnext = 32; // guard against round-off error on <0 steps

if (spancount > 1)
{
sstep = (snext - s) / (spancount - 1);
tstep = (tnext - t) / (spancount - 1);
}
}

do
{
*pdest++ = *(pbase + (s >> 16) + (t >> 16) * cachewidth);

s += sstep;
t += tstep;
} while (--spancount > 0);

s = snext;
t = tnext;

} while (count > 0);

} while ((pspan = pspan->pnext) != NULL);
}

As you can see, the only differences are that it uses a 32 span count instead of 16 and also the shift is different.


Code: [Select]
sstep = (snext - s) >> 5;
tstep = (tnext - t) >> 5;

Instead of this:

Code: [Select]
sstep = (snext - s) >> 4;
tstep = (tnext - t) >> 4;

I had a go at creating this asm function myself but of course I failed miserably  >:(

« Last Edit: April 09, 2013, 07:05:06 AM by NovaCoder »
Logged

Matt Hey

  • Sr. Member
  • ****
  • Posts: 293
    • View Profile
Re: 68k Assembler help
« Reply #18 on: April 14, 2013, 10:38:01 AM »

Ok, tried Frank's with QDIV=1 and NICE_DIV=0 but that didn't seem to make it any quicker, so I guess I'll stick with the other asm for Quake 2.  The 'other' asm was written by John Selck who I think may have once ported Duke Nukem to 68k.

I've looked a little closer at the assembler. The biggest reason why Frank's code was slower was instruction scheduling. An FDIV takes 37 cycles on the 68060 but can execute some 70 plus integer instructions in parallel if another floating point instruction is not encountered which can make quite a difference. Franks algorithms are better in some places and his code is much more readable and documented.

Apart from improving speed, I also need a less precise version (only used for distant objects where 16-step precision isn't needed), this is how the function is written in C.

...

As you can see, the only differences are that it uses a 32 span count instead of 16 and also the shift is different.

...

I had a go at creating this asm function myself but of course I failed miserably  >:(

Yea, it's not quite that easy to adapt. A DrawSpans32() would be bigger and slower. I probably wouldn't get it right the first time either. There are several speedups that are not exactly straightforward to understand. Assembler allows so much power and freedom that some advanced optimizations are difficult to follow but can be very interesting too.

I did try to improve the assembler DrawSpans16() as there is room for improvement. I renamed the function to _ass_DrawSpans16 so you would have to change the stub accordingly. It's still not fully optimized. There are a couple of integer divisions that could use a table like Frank's code does but John's code uses some tricks I don't fully understand. Well, you can see if I broke anything so far. It's pretty easy to break something in this complicated of a function and I have no way to debug without AGA. It looks like your Quake 2 is already running at acceptable speeds from your latest video. It's probably not as peppy with 68060@50MHz though.

Logged

NovaCoder

  • Full Member
  • ***
  • Posts: 139
    • View Profile
Re: 68k Assembler help
« Reply #19 on: April 14, 2013, 12:36:29 PM »

Hiya Matt,

Yes thanks, yep I'm finally getting Quake 2 running at a reasonable speed on my setup  8)

Don't worry about doing that DrawSpans32(), I did some more tests and the speed difference isn't really worth the hassle so I'll just stick with a single DrawSpans16 routine.

As for improving DrawSpans16, thanks I'll give it a try and see if it's any faster.   I was thinking that taking the table from Frank's code would be a good idea too, I think that might give the best speed.
« Last Edit: April 14, 2013, 12:59:56 PM by NovaCoder »
Logged

NovaCoder

  • Full Member
  • ***
  • Posts: 139
    • View Profile
Re: 68k Assembler help
« Reply #20 on: April 15, 2013, 05:10:28 AM »

Hiya Matt,

I tried '_ass_DrawSpans16' but that caused a crash, I could try to update John's code with your updates one at a time and see where the problem is but that will take me some time.

Logged

Matt Hey

  • Sr. Member
  • ****
  • Posts: 293
    • View Profile
Re: 68k Assembler help
« Reply #21 on: April 15, 2013, 03:46:21 PM »

I tried '_ass_DrawSpans16' but that caused a crash, I could try to update John's code with your updates one at a time and see where the problem is but that will take me some time.

Doh! I found a mistake. I ran out of data registers so I used an address register where I usually would use a data register but address register calculations don't set the cc. Amateur mistake. I found another little optimizations to make up for it ;). I'll attach the new version.
Logged

NovaCoder

  • Full Member
  • ***
  • Posts: 139
    • View Profile
Re: 68k Assembler help
« Reply #22 on: April 16, 2013, 12:54:00 AM »

Matt,

Tried the updated version, this one didn't crash anymore but the output isn't right, I'll attach a screen grab that I took.  If you can give me a list of the changes you made I can try them one at a time to the working copy?

Thanks,
Chris
Logged

Matt Hey

  • Sr. Member
  • ****
  • Posts: 293
    • View Profile
Re: 68k Assembler help
« Reply #23 on: April 16, 2013, 03:05:49 AM »

Tried the updated version, this one didn't crash anymore but the output isn't right, I'll attach a screen grab that I took.  If you can give me a list of the changes you made I can try them one at a time to the working copy?

Hmmm. I don't see anything else obvious. The assembler file is in my last post but it might not be easy to just replace some parts one at a time. I sure wish I could step through it with a debugger as I can see bugs as well as potential optimizations. I have assembled a test version with the old final part (with minor changes to make it work) and bounds/clipping of values commented out. The latter could cause some visuals to be messed up but you shouldn't end up with bars if it's working. The stub doesn't need to be changed as the function name is still _ass_DrawSpans16(). Let me know if it's working.
Logged

Veda

  • Hero Member
  • *****
  • Gender: Male
  • Posts: 1008
  • Sleep is overrated
    • View Profile
Re: 68k Assembler help
« Reply #24 on: April 16, 2013, 03:10:14 AM »

just to exclude somethings the Bars seem to occure every 16 pixels some I have a feeling there is a pointer gone haywire.
Logged

NovaCoder

  • Full Member
  • ***
  • Posts: 139
    • View Profile
Re: 68k Assembler help
« Reply #25 on: April 16, 2013, 04:08:09 AM »

Hmmm. I don't see anything else obvious. The assembler file is in my last post but it might not be easy to just replace some parts one at a time. I sure wish I could step through it with a debugger as I can see bugs as well as potential optimizations. I have assembled a test version with the old final part (with minor changes to make it work) and bounds/clipping of values commented out. The latter could cause some visuals to be messed up but you shouldn't end up with bars if it's working. The stub doesn't need to be changed as the function name is still _ass_DrawSpans16(). Let me know if it's working.

Hiya,

This new version doesn't crash either but now it doesn't produce anything on the screen at all (screen remains black).

Any chance you could debug this on WinUAE?

Actually, what am I talking about, I can just build an RTG version for you if you're happy to debug it?

As you've got AmiDevCpp you can even do your own test builds if I send you my whole workspace :)

How much memory do you have in your real Amiga  BTW?
« Last Edit: April 16, 2013, 04:12:38 AM by NovaCoder »
Logged

Matt Hey

  • Sr. Member
  • ****
  • Posts: 293
    • View Profile
Re: 68k Assembler help
« Reply #26 on: April 16, 2013, 04:13:58 AM »

just to exclude somethings the Bars seem to occure every 16 pixels some I have a feeling there is a pointer gone haywire.

Or incorrect clipping/bounds checking. I commented out the clipping in the test version so we will see. A haywire pointer usually causes a crash but not always. It could maybe be overflow from an incorrect size or sign conversion (called casting in C) also. I'm looking but haven't spotted it yet.

Logged

Matt Hey

  • Sr. Member
  • ****
  • Posts: 293
    • View Profile
Re: 68k Assembler help
« Reply #27 on: April 16, 2013, 04:37:10 AM »

This new version doesn't crash either but now it doesn't produce anything on the screen at all (screen remains black).

I was expecting bars or working. Strange. That doesn't really tell me anything.

Any chance you could debug this on WinUAE?

I doubt BDebug would work without turning JIT off and that's really slow. Slow doesn't hurt while I'm debugging but it wastes a lot of time to get to that point.

Actually, what am I talking about, I can just build an RTG version for you if you're happy to debug it?

Is it that easy? I have tried to promote some of your other AGA programs with ModePro set to use planar data but it still failed with an error message.

As you've got AmiDevCpp you can even do your own test builds if I send you my whole workspace :)

True. I really need to set up a dedicated desktop as my laptop doesn't have much HD space left. I have a Pentium 4 tower at 2.8GHz that was supposedly defective but seems to be working now. I may need a HD and monitor though. I would like to play with AROS on it too.

How much memory do you have in your real Amiga  BTW?

Fast Ram:
64+16MB on the accelerator (128MB is max but my socketed oscillator sticks up blocking a SIMM), 16MB on the motherboard and 16MB from the Voodoo 4 (has 32MB but the Elbox software seems to have a bug). It's about 105k on boot but it's slower after the first 80MB is used.

« Last Edit: April 16, 2013, 04:45:11 AM by Matt Hey »
Logged

NovaCoder

  • Full Member
  • ***
  • Posts: 139
    • View Profile
Re: 68k Assembler help
« Reply #28 on: April 16, 2013, 05:15:31 AM »


Is it that easy? I have tried to promote some of your other AGA programs with ModePro set to use planar data but it still failed with an error message.

Yep it's pretty easy, I have already have CyberGFX versions of all my AGA code so it's pretty much a copy-and-paste.

Let me know if you want me to either build you an RTG debug version of Quake 2 (obviously it will run a bit slower) or if you just want me to send the entire Workspace to you so that you can build it yourself.   I can also add a console command so you can switch between the old and the new assembler drawspan functions on the fly.
« Last Edit: April 16, 2013, 05:17:04 AM by NovaCoder »
Logged

Team Chaos Leader

  • Administrator
  • Sr. Member
  • *****
  • Posts: 484
  • JC + Asm Coder
    • View Profile
Re: 68k Assembler help
« Reply #29 on: April 16, 2013, 11:29:12 AM »

... 16MB from the Voodoo 4 (has 32MB but the Elbox software seems to have a bug). It's about 105k on boot but it's slower after the first 80MB is used.

If your voodoo4 has 2 separate banks of 16MB then there is no way for P96 to use both banks.  So u can only get 1 bank of 16MB for your gfx card and the other 16MB can be used as FakeFastRam or HiSpeedPCI_DMA_RAM.

So probably not a Elbox bug.
Logged
Pages: 1 [2] 3 4