I’m wondering how I can use cGPT in a particular usecase and if so how can I go about feeding training data to it?
Whati am trying to accomplish: I want to be able to supply cGPT with a music file (.ogg or .mp3) and get an accuracy of .001 BPM as to what the BPM of a song is. Huge bonus points if it can also print out at which second (down to .001 sec) where a BPM would change in a song.
The change of BPM (beats per minute) from one value to another can not be arbitrarily precise. At 60 BPM, there is only one per second, you want 0.001 s resolution, that is 1-thousandths of a beat. A 1 kHz tone only does one full wavelength in that time.
It also depends on how long the samples are. A 0.2 second sample of hardly going to give a BPM at all.
Maybe you can get down to fractions of delta-BPM at high initial BPM and long samples. But that is it.
Then there is the actually big question how it is even relevant? How would it be relevant if it is 60 or 60.001 BPM?
This application of deep learning would apply to music suitable for playing DDR/ITG/Stepmania/Stepmaniax/PIU etc.; essentially music gaming:
Most music that would be reasonably fun to play falls within 110-240BPM and runs between 2.5 and 7 minutes long. At 110BPM, a song with a coded 110BPM, but a true BPM of 110.001 will drift by roughly 2ms. Music games are predicated on timing precision down to 15ms as a minimum. I, myself, hit notes within a rough range of 6ms at my best (and I’m barely top 100 in the world).
scorecard for reference
You can produce the audio with arbitrary temporal precision, the issue is that this precision is simply impossible to reconstruct given the low number of “virtual sample points per time” (as in relevant for the BPM), same goes for the discussed wavelength of the actual sound, putting up yet another limit, where just measuring the frequency becomes less and less accurate/possible.