@Daytona
I'm not sure I appreciate your tone. I didn't write the allocator, I maintain it. I also fixed a significant number bugs and fragmentation issues that plagued lower end cards. The speculation that the allocator was unable to satisfy the request was ultimately correct, even if it wasn't a fragmentation issue in this case, Fragmentation is an issue I have encountered *many* times previously on at least 3 different GPU types. When this thread first came up I had no access to the source code to make a more detailed assessment and even now I have no access to a working OS4.1/Radeon installation to actually test anything.
We are talking about code that, according to SVN I have not looked at since 2013-06-20. If you want more commitment, feel free to hire me. My contracting rates are quite reasonable. I'd even give you a discount. Call it 450 UKP/day ?
Daytona675x wrote:Okay, so the whole story is like this if I get you right:
1. you (ab)use p96AllocBitmap to get aligned VRAM
Yes. As I stated from the very beginning.
2. for whatever reason you are forced to ask it for a bitmap with the depth of the w3d-context (why btw.? you just want memory from it, why not ask it for a 32bit bitmap that doesn't force you to use a height beyond limits? After all it's just the memory you want, I'd think that p96AllocBitmap doesn't care what you later do with that?)
OK, you go right ahead and do that; make sure it's BMF_DISPLAYABLE, 32-bit deep always. Problem solved, right?
Wrong. Your requested 32-bit BitMap almost always ends up in FAST ram ready to be paged in only when RTG decides it's necessary to do so. You might get lucky sometimes, most of the time you won't. What are you going to do now? Your graphics card will at best render a load of garbage, or it will simply hang your entire system. You think this approach was never attempted before?
As I have explained more than once in this thread that the ONLY way to guarantee that you get a VRAM resident block of memory from the RTG system at the point of allocation is to provide a friend BitMap. The friend BitMap in this case is W3D_Context->drawregion. As soon as you provide a friend BitMap, it's format overrides whatever format you ask for and only respects the width and height of your request. For BitMaps, this is not normally a problem because you aren't supposed to care about what depth it is because you aren't supposed to be manipulating it's content directly. You have graphics library and things like WritePixelArray for that.
3. because you ask for a 16bit bitmap in case of a 16bit W3D context you'll always end up asking for a bitmap at least being 2048 x 4096 internally if you want RAM for a 2048 x 2048 texture, even if no further alignment was necessary.
4. apparently this > 4096 height is the reason for the fail.
Apparently so. I don't know the exact limits of what is allowed, but I suspect it will be the same as the maximum screen size you are allowed.
5. and that >4096 height only happens because you say that you don't increase the width beyond 2048 instead (and the reason given is that the Permedia2 and maybe some others cannot handle more; fine, but what about Radeons? So you're saying tha
t to satisfy Permedia2 etc. you also cripple Radeons? Why not increase that limit for cards that can? Probably they don't laugh at you if you ask for a bitmap where both width and height are smaller than 4096.)
6. on 32 bit contexts it failed too for version 53.10. I asume there was a bug in your "complex"
calculations which is now fixed?
There was a bug in which the upper 2048 was not reached in every situation. That is what 52.11 fixed, when I eventually got my cross SDK reinstalled. Why do you think increasing the width past 2048 will help? That already won't work on most of the supported hardware and probably won't work on R200 either.
If you want to help, why not write a test program yourself to ascertain the maximum hardware width you can achieve with a genuinely VRAM resident BitMap and I can write some special case handling for when we will exceed 2048x2048 on that basis? After all, there clearly isn't a problem when we are allocating lower sizes.
So, what's with 16bit contexts? Sounds like you cannot deliver a fix unless you move away from abusing p96AllocBitmap (or work around (2) and / or (5)).
Is that going to happen or not? Will there be a fix or not?
Better still, don't use AllocBitMap. I can guarantee the moment you start lifting the limits it places on displayable width and height you'll get other problems elsewhere. What we need is a raw VRAM allocator as described previously. The sort of functionality that RadeonHD.resource provides. Feel free to develop it and I'll reimplement the W3D_Picasso96.library to use it.
If you are not able to fix it because you a) don't know how to implement your own allocator and b) have to ask for a 16bit bitmap with a max width of 2048 on a 16bit context
then there's apparently no solution to this problem, at least none you can deliver.
Sarcasm much? I have probably written more allocators than you would think, using many different strategies. I didn't write this one, however. I have fixed many bugs and inefficiencies within it, but this one remains and you are correct in your ultimate assessment that there isn't a lot I can do about it.
It's highly likely that RTG takes one look at the request for a 4100 pixel tall BMF_DISPLAYABLE BitMap, and once it stops laughing, returns NULL.
Yes, and if you knew that from the beginning, why all the talk and speculating and whatever beforehand? And why do you write an algorithm that asks for sizes you know will fail? And if you knew that there's apparently no way to create a 2048 x 2048 texture on a 16bit context the way you implemented things, then why didn't you do the following:
I suggest to at least fix the return values of W3D_Query W3D_Q_MAXTEXWIDTH / W3D_Q_MAXTEXHEIGHT accordingly, so that those don't falsely return 2048 x 2048 on a 16bit context but 1024 x 1024 instead!
Yet there are plenty of situations in which 2048 is fine. Most 8-bit and 16-bit texture formats would work. The specific issue case is 2038x2048x32 on 16-bit contexts. Earlier you were berating me (wrongly) for apparently restricting the allocator to only care about the non Radeon untermensch and suddenly you want to reduce everybody's functionality that aren't affected by your specific use case? W3D_Q_MAXTEX is for querying the hardware limits. It returns 2048 on Permedia2 also, but you physically could not allocate a 32-bit texture that size as it would exceed the memory available to the device by a factor of 2. There is an environment variable there to allow you to override it to a smaller value to force applications to respect it. Guess how much it helped? Almost nothing cared.
p.s.:
I just did some more heavy tests with our latest game which uses quite some textures and I forced frequent recreation of all textures, deletion here, creation there, changing here, changing there. With 53.11 debug it never fails to give me my 2048x2048 texture (if on a 32bit context, otherwise it always fails of course) .
So I'd say that also indicates that your bug was simply a miscalculation of yours (double height?) and that memory fragmentation was not an issue at all (at least not on a system with >32 MB VRAM), as I guessed earlier in this thread.
Yes, yes, you are very perceptive. I expect a fix any day now.