Title: OpenCL -- GPU acceleration Post by: AUDIODIDAKT on May 03, 2012, 01:35:45 pm Hey Peter,
Is it possible to use OpenCL (CUDA) in the preprocess to speed things up dramaticly. for eg. It can convert flac to wav (with my nvidia 9800 gtx) 2-3 times faster than my overclocked quad-core Q9550. So use CPU and GPU together in the preprocess. ???? Roy Title: Re: OpenCL -- GPU acceleration Post by: PeterSt on May 06, 2012, 09:18:41 am Hi Roy,
I guess it can. But it won't be as simple as using the GPU because this is all (explicitly) about multithreading, while all is already setup in a multithreaded fashion - and it is not about converting one FLAC track, but about a whole album (or more). So, in the current setup this needs one GPU core to be faster than one normal CPU core, and although I didn't look it up, I don't think this will be the case; Using numerous threads/cores for one FLAC will make that convert faster all right, but what about the other tracks of the album ? But I really don't know. So yes, I thought about it for sure, but no, I didn't dive into it in decent fashion yet. Btw, it seems that there's a working example out there somewhere (regarding your post). If you can point me to that (by email is better I think) I can start looking there ... Thanks ! Peter Title: Re: OpenCL -- GPU acceleration Post by: CoenP on May 06, 2012, 09:59:06 am Perhaps you are looking for this: http://www.cuetools.net/wiki/FLACCL (http://www.cuetools.net/wiki/FLACCL)?
The cpu is easily outperformed (on single Flacs) Regards, Coen Title: Re: OpenCL -- GPU acceleration Post by: PeterSt on May 06, 2012, 10:53:13 am Thanks Coen. But that is exactly what I meant ...
Undoubtedly (but I didn't check) the conversion of one FLAC file is optimized to use all the GPU cores available, and I could have done the same with CPU cores and one file. But I did it the other way around and optimized it all for as many files in parallel as possible. This is how a 12 core (hyperthreaded) CPU converts 12 files in parallel, and any album of 12 tracks or less loads in 1-2 seconds. So ... When one track is optimized for using all the GPU cores, then they are not available anymore for a next track to convert in parallel; that one track may be 10 times faster all right, but net it wouldn't differ much (for an 8 or 12 core CPU). Maybe it turns out to be very different, but at least the theory (ok, mine) withholds me from really attempting it at this moment. But the least I could do is try what happens with your given example (assuming my nVidia in there is the right one). Ok, we'll see ... Thanks, Peter |