Help With Mass Downloads
Hi all! I'm working on a research project with the University of Virginia to help computers learn to read printed sheet music. For this, we need to create a training dataset, and for this, we want to download a massive number of MuseScore XML and PDF files. Would anyone know of ways to automate this download process?
Thanks!
Comments
I cannot help you with your request, unfortunatelly.
Are you planning to only train on svg/vector based music? which is what MuseScore generated PDFs normally contain. Or are you also generating "pixel based" images like jpg/png etc?
It would be interesting to know what "correct answer" you are planning to use for the training. Is it a MuseScore file? or is it the MusicXML file? Or something else.
As I'm sure you know, there are lots of printed music at IMSLP.org as well. And a substantial part has been copied/"transcribed" :-) into MuseScore scores. That might be possible to use as training material as well.
I would be surprised if there is a way to automate such a process. You're going to have to wander around the 'Net finding the scores. They're not particularly in one place where you could automate the download. One exception to that statement: IMSLP. Talking to them would probably be your first step. I believe there are other similar libraries of music; I just don't know where they are.
You can find a bunch of MuseScore files at:
You can use MuseScore to convert them to MusicXML and PDF on the command line.
See YAML files in the
data
subdirectories for links to IMSLP scores these transcriptions were based on.