
The Song Search
Deep Learning - CS 7150
Prof. David Bau
September 29, 2022
Team members
- Praveen Kumar Sridhar (sridhar.p@northeastern.edu)
- Isha Hemant Arora (arora.isha@northeastern.edu)
Literature Review
- Main Paper/Blog
- Links:
- Hawthorne, Curtis, et al. "Sequence-to-sequence piano transcription with Transformers." arXiv preprint arXiv:2107.09142 (2021).
- Gardner, Josh, et al. "Mt3: Multi-task multitrack music transcription." arXiv preprint arXiv:2111.03017 (2021).
- https://magenta.tensorflow.org/transcription-with-transformers (the original blog)
- A brief review of these papers:
- The blog (and the papers) start with describing the task of Automatic Music Transcription (AMT) which is the task of extracting symbolic representations of music from raw audio.
- The authors then speak about the course of their research, that initially focused on AMT for pianos (as published in November 2021), but now is gradually extending towards other instruments.
- To achieve their results, they have implemented a T5 small model.
- The major focus today is exploring on making a general purpose AMT.
- For this general purpose AMT they use MT3 (Multi-task Multitrack Music Transcription), which we found to be very interesting (and was the focus of the second paper that was published in March 2022).
- Links:
- Auxiliary Papers/Blogs
- https://towardsdatascience.com/3-reasons-why-music-is-ideal-for-learning-and-teaching-data-science-59d892913608 (Max Hilsdorf)
- Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
- Gong, Yuan, Yu-An Chung, and James Glass. "Ast: Audio spectrogram transformer." arXiv preprint arXiv:2104.01778 (2021).
Proposal for the Main Question
Aim
The aim of our project is to build an Information Retrieval system for music.
Flow
- As a first step, we have chosen to use Google Magenta’s research work on Music Transcription with Transformers, to get a string representation of the music.
- Once we have understood and familiarized with this research work, we aim to use the string representation of the music.
- As a next step, we want to extract vector representation of the music, that the transformer will undoubtably produce as a consequence of its architecture.
- Then use these vector representations to perform information retrieval tasks like song identification and retrieval.
- Finally, we want to try to find and implement any other extension of using these representations.