Page 3 of 4

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Thu May 21, 2015 9:40 pm
by Karlos
Early indications are that 53.11 fixes a similar issue reported here:

http://www.amigans.net/modules/xforum/v ... &start=620

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Fri May 22, 2015 11:38 am
by Daytona675x
@Karlos
I just tested your debug build, sadly it simply crashs right on startup, apparently during W3D context setup already (didn't have the time to check that out more exactly).
Tested not just with my stuff but also with some Warp3D applications found on os4depot etc. which all work with the normal lib.
OS4.1 FE, practically clean OS install, latest updates, sam460ex, Radeon 9250.
Since 2048 textures don't work reliable I simply asume in my W3D apps that the max. texture size is 1024x1024 if the driver says something like 2048x2048.

@Hans
This behaviour doesn't make sense. I'm nowhere near as familiar with the Warp3D internals as Karlos, but my impression was that the allocator doesn't care in the slightest what the width and height are; only the total allocation size matters.
Of course it doesn't make sense, that's why it's a bug report ;) Anyway, that's what's happening, feel free to check it out yourself. Besides that: apparently he uses width and height for some alloc size calculations. If the bug is inside those (and it smells like that) then width and height may very well matter.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Fri May 22, 2015 3:15 pm
by Karlos
Can you please report the full version of all Warp3D | Picasso96 components in your installation and the serial / debugbuffer output obtained during the crash?

As the same driver works elsewhere and I don't have a comparable system to test on, there isn't a lot to go on here.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Fri May 22, 2015 4:19 pm
by Daytona675x
Hehehe, nevermind :)

The reason for everything crashing was that your tar archive contains TWO files named W3D_Picasso96.library, one being the real lib and the other being a 0 byte-sized file, probably some remains from a symbolic link from inside your dev folder?
Now guess which one got extracted here by my decompressor (funny enough: it's the built in decompressor of Directory Opus 10 for Windows which I use as Explorer replacement ;) it only outputs and shows just the 0 byte version without any further warning...)

Okay, so I just retested with the REAL library file...
The issue is still happening :(

2048x2048 W3D_R8G8B8A8:
[Warning ] [MEM_AllocMem] p96AllocBitMap(2048, 4100,...) returned null

The 4100 are the height you computed for that allocation? What are you doing?
My lib / sys versions:

Kickstart 53.70, Workbench 53.14
Warp3D.library 53.22
Warp3DPPC.library 4.3
W3D_R200.library 53.26
W3D_Radeon.library 53.21
W3D_Picasso96.library 53.11 (your active debug lib)
W3D_Picasso96.library 53.10 (my former version)

p.s.: I played with my calculator. Asuming I need 4 kb alignment of the whole buffer and an additional 64 byte alignment for each row, then I end up with a p96AllocBitmap(2048,2065,32) call. Not 2048 x 4100.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Fri May 22, 2015 6:15 pm
by Karlos
Hi,

The height is being calculated on the basis of the linear size that was requested and a maximum allowed width of 2048. If the allocated bitmap is 32 bits deep then you are correct in asserting that it should be only ~2048 (plus alignment and any other required slop) tall. However, you cannot assume the bitmap is 32-bits deep. It will be the same depth as whatever Bitmap is used for the W3D_Context.

On a 16-bit display, if you request storage for a 2Kx2Kx32-bit texture you're going to get a BitMap 2Kx4Kx16-bits

Can you repeat your test and confirm what display depth is used?

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Sat May 23, 2015 9:46 am
by Daytona675x
Indeed, on a 16 bit display it fails and on a 32 bit display it doesn't.
So, what's the problem?
Is your alloc-call failing on a 16bit screen because of that height > 4096 your calculations end up with? Is there a width/height limit on p96AllocBitmap you run into?

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Sat May 23, 2015 1:57 pm
by Karlos
Daytona675x wrote:Indeed, on a 16 bit display it fails and on a 32 bit display it doesn't.
So, what's the problem?
Is your alloc-call failing on a 16bit screen because of that height > 4096 your calculations end up with? Is there a width/height limit on p96AllocBitmap you run into?
Yes. As I suggested somewhere earlier in the thread. It's highly likely that RTG takes one look at the request for a 4100 pixel tall BMF_DISPLAYABLE BitMap, and once it stops laughing, returns NULL.

Also, they're not just "my calculations", the rounding and alignment are the result of what works across multiple graphics chips. For example, you mention earlier about "64 byte alignment" for row width. This is not a universal value, just a commonly encountered one. The Permedia2, for example, cannot (as was assumed in an earlier version of it's RTG driver) handle any arbitrary multiple of 32 for a width. It uses 3 3-bit partial products that result in increasingly more severe alignment restrictions as you allocate wider bitmaps. Early attempts at making the OS4 Warp3D driver failed specifically because such assumptions had been made in the 2D driver and the hardware rasterization functions simply couldn't work on arbitrary multiple of 32 wide bitmaps. I lost count of the number of utterly hard cold-reboot freezes it took to get that working.

It turns out that the only BitMap widths that work (near) universally across supported chips are powers of 2 (within the range 32 - 2048 anyway) and that's what the W3D_P96 driver attempts to obtain. So, a driver asks for N+A bytes, where A is some 3D driver specific texture alignment requirement. The W3D_P96 driver takes this N+A value, gets the square root as a starting point and then the nearest power of 2 that's equal to or larger than this value as the width (up to 2048). The height is then a function of the requested size and the depth that was forced upon us by the RTG system based on whatever the W3D_Context's display depth is. The final height is also subject to post-rounding for various other RTG-esque reasons. You then get a very specific and sometimes weird looking BitMap dimension request. When the texture depth and display BitMap depth are the same, the requested dimensions are a lot saner and more likely to work.

We have to do all this because there's no "hey, I'd like X bytes of guaranteed VRAM allocated contiguously with the following alignment" function call. The reason that the W3D_SI drivers don't suffer from this is that the RadeonHD.resource provides this type of functionality.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Sat May 23, 2015 6:19 pm
by Daytona675x
Okay, so the whole story is like this if I get you right:

1. you (ab)use p96AllocBitmap to get aligned VRAM

2. for whatever reason you are forced to ask it for a bitmap with the depth of the w3d-context (why btw.? you just want memory from it, why not ask it for a 32bit bitmap that doesn't force you to use a height beyond limits? After all it's just the memory you want, I'd think that p96AllocBitmap doesn't care what you later do with that?)

3. because you ask for a 16bit bitmap in case of a 16bit W3D context you'll always end up asking for a bitmap at least being 2048 x 4096 internally if you want RAM for a 2048 x 2048 texture, even if no further alignment was necessary.

4. apparently this > 4096 height is the reason for the fail.

5. and that >4096 height only happens because you say that you don't increase the width beyond 2048 instead (and the reason given is that the Permedia2 and maybe some others cannot handle more; fine, but what about Radeons? So you're saying that to satisfy Permedia2 etc. you also cripple Radeons? Why not increase that limit for cards that can? Probably they don't laugh at you if you ask for a bitmap where both width and height are smaller than 4096.)

6. on 32 bit contexts it failed too for version 53.10. I asume there was a bug in your "complex" ;) calculations which is now fixed?

So, what's with 16bit contexts? Sounds like you cannot deliver a fix unless you move away from abusing p96AllocBitmap (or work around (2) and / or (5)).
Is that going to happen or not? Will there be a fix or not?

If you are not able to fix it because you a) don't know how to implement your own allocator and b) have to ask for a 16bit bitmap with a max width of 2048 on a 16bit context
then there's apparently no solution to this problem, at least none you can deliver.
It's highly likely that RTG takes one look at the request for a 4100 pixel tall BMF_DISPLAYABLE BitMap, and once it stops laughing, returns NULL.
Yes, and if you knew that from the beginning, why all the talk and speculating and whatever beforehand? And why do you write an algorithm that asks for sizes you know will fail? And if you knew that there's apparently no way to create a 2048 x 2048 texture on a 16bit context the way you implemented things, then why didn't you do the following:

I suggest to at least fix the return values of W3D_Query W3D_Q_MAXTEXWIDTH / W3D_Q_MAXTEXHEIGHT accordingly, so that those don't falsely return 2048 x 2048 on a 16bit context but 1024 x 1024 instead!

p.s.:
I just did some more heavy tests with our latest game which uses quite some textures and I forced frequent recreation of all textures, deletion here, creation there, changing here, changing there. With 53.11 debug it never fails to give me my 2048x2048 texture (if on a 32bit context, otherwise it always fails of course) .
So I'd say that also indicates that your bug was simply a miscalculation of yours (double height?) and that memory fragmentation was not an issue at all (at least not on a system with >32 MB VRAM), as I guessed earlier in this thread.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Sun May 24, 2015 11:56 am
by Karlos
@Daytona

I'm not sure I appreciate your tone. I didn't write the allocator, I maintain it. I also fixed a significant number bugs and fragmentation issues that plagued lower end cards. The speculation that the allocator was unable to satisfy the request was ultimately correct, even if it wasn't a fragmentation issue in this case, Fragmentation is an issue I have encountered *many* times previously on at least 3 different GPU types. When this thread first came up I had no access to the source code to make a more detailed assessment and even now I have no access to a working OS4.1/Radeon installation to actually test anything.

We are talking about code that, according to SVN I have not looked at since 2013-06-20. If you want more commitment, feel free to hire me. My contracting rates are quite reasonable. I'd even give you a discount. Call it 450 UKP/day ?
Daytona675x wrote:Okay, so the whole story is like this if I get you right:

1. you (ab)use p96AllocBitmap to get aligned VRAM
Yes. As I stated from the very beginning.
2. for whatever reason you are forced to ask it for a bitmap with the depth of the w3d-context (why btw.? you just want memory from it, why not ask it for a 32bit bitmap that doesn't force you to use a height beyond limits? After all it's just the memory you want, I'd think that p96AllocBitmap doesn't care what you later do with that?)
OK, you go right ahead and do that; make sure it's BMF_DISPLAYABLE, 32-bit deep always. Problem solved, right?

Wrong. Your requested 32-bit BitMap almost always ends up in FAST ram ready to be paged in only when RTG decides it's necessary to do so. You might get lucky sometimes, most of the time you won't. What are you going to do now? Your graphics card will at best render a load of garbage, or it will simply hang your entire system. You think this approach was never attempted before?

As I have explained more than once in this thread that the ONLY way to guarantee that you get a VRAM resident block of memory from the RTG system at the point of allocation is to provide a friend BitMap. The friend BitMap in this case is W3D_Context->drawregion. As soon as you provide a friend BitMap, it's format overrides whatever format you ask for and only respects the width and height of your request. For BitMaps, this is not normally a problem because you aren't supposed to care about what depth it is because you aren't supposed to be manipulating it's content directly. You have graphics library and things like WritePixelArray for that.
3. because you ask for a 16bit bitmap in case of a 16bit W3D context you'll always end up asking for a bitmap at least being 2048 x 4096 internally if you want RAM for a 2048 x 2048 texture, even if no further alignment was necessary.

4. apparently this > 4096 height is the reason for the fail.
Apparently so. I don't know the exact limits of what is allowed, but I suspect it will be the same as the maximum screen size you are allowed.
5. and that >4096 height only happens because you say that you don't increase the width beyond 2048 instead (and the reason given is that the Permedia2 and maybe some others cannot handle more; fine, but what about Radeons? So you're saying that to satisfy Permedia2 etc. you also cripple Radeons? Why not increase that limit for cards that can? Probably they don't laugh at you if you ask for a bitmap where both width and height are smaller than 4096.)
6. on 32 bit contexts it failed too for version 53.10. I asume there was a bug in your "complex" ;) calculations which is now fixed?
There was a bug in which the upper 2048 was not reached in every situation. That is what 52.11 fixed, when I eventually got my cross SDK reinstalled. Why do you think increasing the width past 2048 will help? That already won't work on most of the supported hardware and probably won't work on R200 either.

If you want to help, why not write a test program yourself to ascertain the maximum hardware width you can achieve with a genuinely VRAM resident BitMap and I can write some special case handling for when we will exceed 2048x2048 on that basis? After all, there clearly isn't a problem when we are allocating lower sizes.
So, what's with 16bit contexts? Sounds like you cannot deliver a fix unless you move away from abusing p96AllocBitmap (or work around (2) and / or (5)).
Is that going to happen or not? Will there be a fix or not?
Better still, don't use AllocBitMap. I can guarantee the moment you start lifting the limits it places on displayable width and height you'll get other problems elsewhere. What we need is a raw VRAM allocator as described previously. The sort of functionality that RadeonHD.resource provides. Feel free to develop it and I'll reimplement the W3D_Picasso96.library to use it.
If you are not able to fix it because you a) don't know how to implement your own allocator and b) have to ask for a 16bit bitmap with a max width of 2048 on a 16bit context
then there's apparently no solution to this problem, at least none you can deliver.
Sarcasm much? I have probably written more allocators than you would think, using many different strategies. I didn't write this one, however. I have fixed many bugs and inefficiencies within it, but this one remains and you are correct in your ultimate assessment that there isn't a lot I can do about it.
It's highly likely that RTG takes one look at the request for a 4100 pixel tall BMF_DISPLAYABLE BitMap, and once it stops laughing, returns NULL.
Yes, and if you knew that from the beginning, why all the talk and speculating and whatever beforehand? And why do you write an algorithm that asks for sizes you know will fail? And if you knew that there's apparently no way to create a 2048 x 2048 texture on a 16bit context the way you implemented things, then why didn't you do the following:

I suggest to at least fix the return values of W3D_Query W3D_Q_MAXTEXWIDTH / W3D_Q_MAXTEXHEIGHT accordingly, so that those don't falsely return 2048 x 2048 on a 16bit context but 1024 x 1024 instead!
Yet there are plenty of situations in which 2048 is fine. Most 8-bit and 16-bit texture formats would work. The specific issue case is 2038x2048x32 on 16-bit contexts. Earlier you were berating me (wrongly) for apparently restricting the allocator to only care about the non Radeon untermensch and suddenly you want to reduce everybody's functionality that aren't affected by your specific use case? W3D_Q_MAXTEX is for querying the hardware limits. It returns 2048 on Permedia2 also, but you physically could not allocate a 32-bit texture that size as it would exceed the memory available to the device by a factor of 2. There is an environment variable there to allow you to override it to a smaller value to force applications to respect it. Guess how much it helped? Almost nothing cared.
p.s.:
I just did some more heavy tests with our latest game which uses quite some textures and I forced frequent recreation of all textures, deletion here, creation there, changing here, changing there. With 53.11 debug it never fails to give me my 2048x2048 texture (if on a 32bit context, otherwise it always fails of course) .
So I'd say that also indicates that your bug was simply a miscalculation of yours (double height?) and that memory fragmentation was not an issue at all (at least not on a system with >32 MB VRAM), as I guessed earlier in this thread.
Yes, yes, you are very perceptive. I expect a fix any day now.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Posted: Sun May 24, 2015 12:22 pm
by Karlos
Since the underlying problem is unlikely to be fixed any time soon and the W3D_Q_MAXTEXWIDTH/HEIGHT functions aren't context aware beyond what is potentially possible on 8-bit (deprecated) or RGB render targets, I propose the following:

1) Cause W3D_AllocTexObject() to fail when attempting to reserve storage for a texture that is likely to exceed the limitations implied by the render target (ie, 2Kx2Kx32-bit texture on a 16-bit render target) in the Radeon drivers.

2) Add a W3D_RuntimeQuery() or similar that can report actual limitations taking into account the present state of the system.

The latter is a better solution but as it's new, nothing will be aware of it.