PRE2019 3 Group15
Group Members
Name | Study | Student ID |
---|---|---|
Mats Erdkamp | Industrial Design | 1342665 |
Sjoerd Leemrijse | Psychology & Technology | 1009082 |
Daan Versteeg | Electrical Engineering | 1325213 |
Yvonne Vullers | Electrical Engineering | 1304577 |
Teun Wittenbols | Industrial Design | 1300148 |
Problem Statement and Objectives
DJ-ing is a relatively new profession. It has only been around for less than a century but has become more and more widespread and the last few decades. This activity has for the most part been executed by human beings. Current technology in the music industry has become better and better at generating playlists, or 'recommended songs' as, for example, Spotify does. Can we integrate a form of this technology into the world of DJs and create a 'robot DJ'? A robot DJ would autonomously create playlists and mix songs, based on algorithms and real-life feedback in order to entertain an audience.
How to develop an autonomous system/robot DJ which enables the user to easily use it as a substitute for a human DJ.
Users
The identification of the primary and secondary users and their needs is based on extensive literature research on the interaction between the DJ and the audience in a party setting. The reader is referred to the references section for the full articles. This section is written based on (Gates, Subramanian & Gutwin, 2006), (Gates & Subramanian, 2006) and (Berkers & Michael, 2017).
Primary users
- Dance industry: this is the overarching organization that will possess most of the robots.
- Organizer of a music event: this is the user that will rent or buy the robot to play at their event.
- Owner of a discotheque or club: the robot can be an artificial alternative for hiring a DJ every night.
Primary user needs
- The DJ-robot is more valued than a human performer.
- The DJ-robot system provides something extraordinary and special.
- The system is better than a human DJ in gathering information regarding audience appreciation.
- The system's user interface is easy to understand, no experts needed.
Secondary users
- Attenders of a music event: these people enjoy the music and lighting show that the robot makes.
- Human DJ's: likely to "cooperate" with a DJ-robot to make their show more attractive.
Secondary user needs
- The music set played is structured and progressive.
- Track selection should fit the audience background.
- The system selects appropriate tracks regarding genre.
- The musical presentation should reflect the audience energy level.
- There is a balance between playing familiar tracks and providing rare, new music.
- The system selects popular tracks that are valued by the audience.
- Similar tracks to what the audience is into are played.
- The dancers are taken on a cohesive and dynamical music journey.
- Track selection should fit the audience background.
- The audience desires control over the music being played, to a certain extent.
- The audience wants to hear their favourite music.
- The audience doesn't want a predictable set of music.
- The DJ-robot takes the audience reaction into account in track selection.
Approach, Milestones, and Deliverables
Approach
The goal of the project is to create a robot that functions as a DJ and provides entertainment to a crowd. In order to reach the goal, first a literature study will be executed to find out the current state of the art regarding the problem. After enough information has been collected, an objective will be defined.
Then, the USE and technical aspects of the problem will be researched. The technical aspect-research will be implemented in a design for the robot. Based on this design a prototype will be built and programmed that is able to meet the requirements of the goal.
Milestones
In order to complete the project and meet the objective, milestones have been determined. These milestones include:
- A clear problem and goal have been determined
- The literature research is finished, this includes research about
- Users (attenders of music events, DJs, club owners)
- The state of the art in the music (AI) industry
- The research on how to create the DJ-software is finished
- Ways of feedback from a crowd
- Spotify API
- The DJ-software is created
- Depending on what method of feedback is chosen, a sensor is also built
- A test is executed in which the environment the software will be used in is simulated
- The wiki is finished and contains all information about the project
Other milestones, which probably are not attainable in the scope of 8 weeks, are
- A test in a larger environment is executed (bar, festival)
- A full scale robot is constructed to improve the crowd’s experience
- A light show is added in order to improve the crowd’s experience
Deliverables
The deliverables for this project are:
- The DJ-software, which is able to use feed-forward and incorporate user feedback in order to create the most entertaining DJ-set
- Depending on what type of user feedback is chosen, a prototype/sensor also needs to be delivered
- The wiki-page containing all information on the project
- The final presentation in week 8
Who's Doing What?
Personal Goals
The following section describes the main roles of the teammates within the design process. Each team member has chosen an objective that fits their personal development goals.
Name | Personal Goal |
---|---|
Mats Erdkamp | Play a role in the development of the artificial intelligence systems. |
Sjoerd Leemrijse | Gain knowledge in recommender systems and pattern recognition algorithms in music. |
Daan Versteeg | |
Yvonne Vullers | Play a role in creating the prototype/artificial intelligence |
Teun Wittenbols | Combine all separate parts into one good concept, with a focus on user interaction. |
Weekly Log
Based on the approach and the milestones, a planning has been made. This planning is not definite and will be updated regularly, however it will be a guideline for the coming weeks.
Week 2
Goal: Do literature research, define problem, make a plan, define users, start research into design and prototype
Group | Mats Erdkamp | Sjoerd Leemrijse | Daan Versteeg | Yvonne Vullers | Teun Wittenbols | |
---|---|---|---|---|---|---|
Monday 10-02 | We formed a group and discussed the first possibilities within the project, chose a general theme and started doing research. We attended the tutor session.
|
Work on SotA and evaluate design options | Attended meeting with tutors: 30 minutes Elaborated the notes of the meeting: 30 minutes Reading and summarizing scientific literature on the interaction between DJ and crowd: 3 hours |
Attended meeting with tutors: 30 minutes | Attended tutor meeting
|
Started doing literature research and summarized Pasick (2015) & Johnson. Formed a group and attended meeting.
|
Tuesday 11-02 | Searching scientific literature on user requirements of the public and the DJ at a party (Gates, Subramanian & Gutwin, 2006), (Gates & Subramanian, 2006): 1 hour | |||||
Wednesday 12-02 | Summarizing scientific literature on user requirements: 3 hours | Started looking for papers about user interaction and user feedback
|
||||
Thursday 13-02 | We had a meeting in which we discussed the feedback from the tutor session, discussed the research and formed a more detailed and specific plan for the project.
|
Meeting with group members, discussing who will be doing what the coming week: 1.5 hours | Meeting with group members, discussing who will be doing what the coming week. Emailed Effenaar.
|
Did some more research on audience interaction and summerized it. (Hödl, Fitzpatrick, Kayali & Holland, 2017)(Zhang, Wu, & Barthet, ter perse) I, also updated the wiki, made the planning more clear and divided the references of the SotA. Attended the meeting.
| ||
Friday 14-02 | Developed access to spotify API 4 hours. | Researched different forms of audience interaction and added it to the SotA. #Receiving feedback from the audience via technology
| ||||
Saturday 15-02 | Created data set from spotify API: 8 hours | Writing the "Users" section based on my prior literature research: 2 hours Updating the section on state of the art based on my prior literature research: 2 hours |
||||
Sunday 16-02 | Updating the lay-out of the wiki: 1 hour | Updating the milestones and deliverables. Continued looking for papers on incorporating user feedback and user feedback for music events. Summarized papers (Barkhuus & Jorgensen, 2008) & (Atherton, Becker, McLean, Merkin & Rhoades, 2008)
|
Week 3
Goal: Continue research, start on design
Group | Mats Erdkamp | Sjoerd Leemrijse | Daan Versteeg | Yvonne Vullers | Teun Wittenbols | |
---|---|---|---|---|---|---|
Monday 17-02 | Group meeting and general discussion: 45 minutes | Attended tutor meeting: 30 minutes. Discussion with group members: 15 minutes. | Attended tutor meeting: 30 minutes. Discussion with group members: 15 minutes. | Attended tutor meeting: 30 minutes. Discussion with group members: 15 minutes. | Attended tutor meeting:
| |
Tuesday 18-02 | ||||||
Wednesday 19-02 | Worked on a first concept model of our designed system, A first model. 4 hours | Elaborated the notes of the meeting: 30 minutes | ||||
Thursday 20-02 | Group meeting, discussing plans and everyone's contributions: 1 hour | Group meeting, discussed plans 1 hour | Attended group meeting: 1 hour. Worked on how the user needs lead to a first model: 2 hours | Attended group meeting: 1 hour | Attended group meeting: 1 hour | Attended group meeting:
|
Friday 21-02 | ||||||
Saturday 22-02 | ||||||
Sunday 23-02 |
Carnaval break
Group | Mats Erdkamp | Sjoerd Leemrijse | Daan Versteeg | Yvonne Vullers | Teun Wittenbols | |
---|---|---|---|---|---|---|
Monday 24-02 | Worked on a schematic to show how the user needs relate to our design: 1.5 hours | |||||
Tuesday 25-02 | ||||||
Wednesday 26-02 | ||||||
Thursday 27-02 | Worked on a model to predict music parameters using the Spotify dataset in a multiple regression model: 2 hours | Get to know the basics of node.js
| ||||
Friday 28-02 | Worked on getting to know javascript, node.js, and visual studio: 6 hours. | |||||
Saturday 28-02 | ||||||
Sunday 1-03 | Started integration of data set in Tempo curve algorhithm 4 hours. | Did research on de Effenaar: 30 minute Elaborating on the algorithm in the first model: 2 hours |
Continued on getting to know javascript and node.js. Also started on getting to know express JS: 4 hours. |
Week 4
Goal: Finish first design, start working on software.
Group | Mats Erdkamp | Sjoerd Leemrijse | Daan Versteeg | Yvonne Vullers | Teun Wittenbols | |
---|---|---|---|---|---|---|
Monday 02-03 | We attended the tutor session and went to the Effenaar in order to get some general information about the possibilities, and or user needs.
|
Attended the tutor meeting: 30 minutes Had a conversation with de Effenaar: 1 hour |
Attended the tutor meeting: 30 minutes Had a conversation with de Effenaar: 1 hour Worked out the notes on the conversation with Effenaar: 30 minutes |
Attended the tutor meeting: 30 minutes Had a conversation with de Effenaar: 1 hour |
Attended the tutor meeting: 30 minutes | |
Tuesday 03-03 | Worked on explaining the feedforward and feedback parameters more clearly in the concept model: 2 hours | |||||
Wednesday 04-03 | Did research on the Spotify audio features, tried to come up with exact definitions: 2 hours | |||||
Thursday 05-03 | We had a meeting in which we discussed the feedback from the tutor session, discussed the research and formed a more detailed and specific plan for the project.
|
Attended the group meeting: 1.5 hours | Attended the group meeting: 1.5 hours | Attended the group meeting: 1.5 hours | Attended the group meeting: 1.5 hours | Attended the meeting
|
Friday 06-03 | Cleaned up data set generation code 2 hours | |||||
Saturday 07-03 | Added new tags to data set & included pre-filtering backend. 7 hours | Worked on a multiple regression model for feedforward: 4 hours | ||||
Sunday 08-03 | Finalized pre-filtering backend + made API calls more reliable 4 hours | Processed the results of the multiple regression analysis and added them to the wiki: 3 hours | Worked more on node.js, expressJS and the UI: 6 hours |
Week 5
Goal: Work on software
Week 6
Goal: Finish software , do testing
Week 7
Goal: Finish up the last bits of the software
Week 8
Goal: Finish wiki, presentation
State of the Art
The dance industry is a booming business that is very receptive of technological innovation. A lot of research has already been conducted on the interaction between DJ's and the audience and also in automating certain cognitively demanding tasks of the DJ. Therefore, it is necessary to give a clear description of the current technologies available in this domain. In this section the state of the art on the topics of interest when designing a DJ-robot are described by means of recent literature.
Defining and influencing characteristics of the music
When the system receives feedback from the audiene it is necessary that it is also able to do something useful with that feedback and convert it into changes in the provided musical arrangement. The most important aspects of this arrangement are chaos, energy and tempo ADD HERE MORE FEATURES DURING THE PROJECT.
Chaos is the opposite of order. Dance tracks with order can be assumed as having a repetitive rhythmic structure, contain only a few low-pitched vocals and display a pattern in arpeggiated lines and melodies. Chaos can be created by undermining these rules. Examples are playing random notes, changing the rhythmic elements at random time-points, or increasing the number of voices present and altering their pitch. Such procedures create tension in the audience. This is necessary because without differential tension, there is no sense of progression (Feldmeier, 2003).
The factors energy and tempo are inherently linked to each other. When the tempo increases, so does the perceived energy level of the track. In general, the music's energy level can be intensified by introducing high-pitched, complex vocals and a strong occurence of the beat. Related to that is the activity of the public. An increase in activity of the audience can signal that they are enjoying the current music, or that they desire to move on to the next energy level (Feldmeier, 2003). Because in general, people enjoy the procedure of tempo activation in which they dance to music that leads their current pace (Feldmeier & Paradiso, 2004).
Algorithms for track selection
One of the most important tasks of a DJ-robot - if not the most important - is track selection. In (Cliff, 2006) a system is described that takes as input a set of selected tracks and a qualitative tempo trajectory (QTT). A QTT is a graph of the desired tempo of the music during the set with time on the x-axis and tempo on the y-axis. Based on these two inputs the presented algorithm basically finds the sequence of tracks that fit the QTT best and makes sure that the tracks construct a cohesive set of music. The following order is taken in this process: track mapping, trajectory specification, sequencing, overlapping, time-stretching and beat-mapping, crossfading.
In this same article, use is been made of genetic algorithms to determine the fitness of each song to a certain situation. This method presents a sketch of how to encode a multi-track specification of a song as the genes of the individuals in a genetic algorithm (Cliff, 2006).
Receiving feedback from the audience via technology
Audiences of people attending musical events generally like the idea of being able to influence performance (Hödl et al., 2017). What is important is the way of interaction with the performers, what works well and what do users prefer?
In the article of Hödl et al. (2017), describes multiple ways of interacting, namely, mobile devices, as can be seen in the article of McAllister et al. (2004), smartphones and other sensory mechanisms, such as the light sticks discussed in the paper of Freeman (2005).
The system presented by Cliff (Cliff, 2006) already proposes some technologies that enables the crowd to give feedback to a DJ-robot. They discuss under-floor pressure sensors that can sense how the audience is divided over the party area. They also discuss video surveillance that can read the crowd's activity and valence towards the music. Based on this information the system determines whether the assemblage of the music should be changed or not. In principle, the system tries to stick with the provided QTT, however, it reacts dynamically on the crowd's feedback and may deviate from the QTT if that is what the public wants.
Another option for crowd feedback discussed by Cliff is a portable technology, more in the spirit of a wristband. One option is a wristband that is more quantitative in nature and transmits location, dancing movements, body temperature, perspiration rates and heart rate to the system. Another option is a much simpler and therefore cheaper solution. This second solution is a simple wristband with two buttons on it; one "positive" button and one "negative" button. In that sense, the public lets the system know whether they like the current music or not, and the system can react upon it.
Summary of Related Research
This patent describes a system, something like a personal computer or an MP3 player, which incorporates user feedback in it's music selection. The player has access to a database and based on user preferences it chooses music to play. When playing, the user can rate the music. This rating is taken into account when choosing the next song. (Atherton, Becker, McLean, Merkin & Rhoades, 2008)
This article describes how rap-battles incorporate user feedback. By using a cheering meter, the magnitude of enjoyment of the audience can be determined. This cheering meter was made by using Java's Sound API. (Barkhuus & Jorgensen, 2008)
Describes a system that transcribes drums in a song. Could be used as input for the DJ-robot (light controls for example). (Choi & Cho, 2019)
This paper is meant for beginners in the field of deep learning for MIR (Music Information Retrieval). This is a very useful technique in our project to let the robot gain musical knowledge and insight in order to play an enjoyable set of music. (Choi, Fazekas, Cho & Sandler, 2017)
This article describes different ways on how to automatically detect a pattern in music with which it can be decided what genre the music is of. By finding the genre of the music that is played, it becomes easier to know whether the music will fit the previously played music.(De Léon & Inesta, 2007)
This describes "Glimmer", an audience interaction tool consisting of light sticks which influence live performers. (Freeman, 2005)
Describes the creation of a data set to be used by artificial intelligence systems in an effort to learn instrument recognition. (Humphrey, Durand & McFee, 2018)
This describes the methods to learn features of music by using deep belief networks. It uses the extraction of low level acoustic features for music information retrieval (MIR). It can then find out e.g. of what genre the the musical piece was. The goal of the article is to find a system that can do this automatically. (Hamel & Eck, 2010)
This article communicates the results of a survey among musicians and attenders of musical concerts. The questions were about audience interaction. "... most spectators tend to agree more on influencing elements of sound (e.g. volume) or dramaturgy (e.g. song selection) in a live concert. Most musicians tend to agree on letting the audience participate in (e.g. lights) or dramaturgy as well, but strongly disagree on an influence of sound." (Hödl, Fitzpatrick, Kayali & Holland, 2017)
This article explains the workings of the musical robot Shimon. Shimon is a robot that plays the marimba and chooses what to play based on an analysis of musical input (beat, pitch, etc.). The creating of pieces is not necessarily relevant for our problem, however choosing the next piece of music is of importance. Also, Shimon has a social-interactive component, by which it can play together with humans. (Hoffman & Weinberg, 2010)
This article introduces Humdrum, which is software with a variety of applications in music. One can also look at humdrum.org. Humdrum is a set of command-line tools that facilitates musical analysis. It is used often in for example Pyhton or Cpp scripts to generate interesting programs with applications in music. Therefore, this program might be of interest to our project. (Huron, 2002)
This article focuses on next-track recommendation. While most systems base this recommendation only on the previously listened songs, this paper takes a multi-dimensional (for example long-term user preferences) approach in order to make a better recommendation for the next track to be played. (Jannach, Kamehkhosh & Lerche, 2017)
In this interview with a developer of the robot DJ system POTPAL, some interesting possibilities for a robot system are mentioned. For example, the use of existing top 40 lists, 'beat matching' and 'key matching' techniques, monitoring of the crowd to improve the music choice and to influence people's beverage consumption and more. Also, a humanoid robot is mentioned which would simulate a human DJ. (Johnson, n.a.)
In this paper a music scene analysis system is developed that can recognize rhythm, chords and source-separated musical notes from incoming music using a Bayesian probability network. Even though 1995 is not particularly state-of-the-art, these kinds of technology could be used in our robot to work with music. (Kashino, Nakadai, Kinoshita, & Tanaka, 1995)
This paper discusses audience interaction by use of hand-held technological devices. (McAllister, Alcorn, Strain & 2004)
This article discusses the method by which Spotify generates popular personalized playlists. The method consists of comparing your playlists with other people's playlists as well as creating a 'personal taste profile'. These kinds of things can be used by our robot DJ by, for example, creating a playlist based on what kind of music people listen to the most collectively. It would be interesting to see if connecting peoples Spotify account to the DJ would increase performance. (Pasick, 2015)
This paper takes a mathematical approach in recommending new songs to a person, based on similarity with the previously listened and rated songs. These kinds of algorithms are very common in music systems like Spotify and of utter use in a DJ-robot. The DJ-robot has to know which songs fit its current set and it therefore needs these algorithms for track selection. (Pérez-Marcos & Batista, 2017)
This paper describes the difficulty of matching two musical pieces because of the complexity of rhythm patterns. Then a procedure is determined for minimizing the error in the matching of the rhythm. This article is not very recent, but it is very relevant to our problem. (Shmulevich, & Povel, 1998)
In this article, the author states that the main melody in a piece of music is a significant feature for music style analysis. It proposes an algorithm that can be used to extract the melody from a piece and the post-processing that is needed to extract the music style. (Wen, Chen, Xu, Zhang & Wu, 2019)
This research presents a robot that is able to move according to the beat of the music and is also able to predict the beats in real time. The results show that the robot can adjust its steps in time with the beat times as the tempo changes. (Yoshii, Nakadai, Torii, Hasegawa, Tsujino, Komatani, Ogata & Okuno, 2007)
This paper describes Open Symphony, a web application that enables audience members to influence musical performances. They can indicate a preference for different elements of the musical composition in order to influence the performers. Users were generally satisfied and interested in this way of enjoying the musical performance and indicated a higher degree of engagement. (Zhang, Wu, & Barthet, ter perse)
A first model
Based on the state of the art and the user needs a first model of our automated DJ system is made. We chose to depict it in a block diagram with separate blocks for the feedforward, the feedback and the algorithm itself.
Feedforward
The feedforward part of our system is completely based on the user input. The user has a lot of knowledge about the desired DJ set to be played beforehand. This information is fed to the system to control it. Because this feedforward block is based on user input, it has to answer to the user needs. Below, a scheme is presented to show how the user needs relate to the feedforward parameters.
The first part of the feedforward is the desired QTT of the set. What a QTT is, is described in the state of the art section. The ability for the user to input a QTT answers to the primary user need of an easy user interface. Providing a QTT to the system also makes it easy for the system to play a structured and progressive set of music, which is a secondary user need. This is in line with the secondary user need that dancers are taken on a cohesive and dynamical music journey. This structure in the music will also keep the audience engaged.
To fulfill the desired QTT certain tracks are delivered to the system to pick from as feedforward. These tracks are delivered via an enormous database with all kinds of music in it, such that the system has enough options to pick from in order to form the best set. In that sense, the system can pick from the audience favourite tracks that are popular and valued by the audience, which are some of our secondary user needs. Since people come to certain music events with certain expectations, the tracks to pick from should be appropriate regarding genre. That way, the audience background is taken into account and it keeps them engaged. This is also an opportunity to limit the algorithm's options for track selection, making the system more stable. Because the database the system can choose from is very large and diverse, the user need of wanting to hear rare, new music is answered. Also, this contributes to an unpredictable set of music that is not boring.
MATS, KAN JIJ MISSCHIEN UITEINDELIJK WAT SCHRIJVEN OVER DE BELANGRIJKSTE SPOTIFY FEATURES DIE WE GAAN GEBRUIKEN ALS FF?
In order to satisfy the audience we have to feedforward background information of the audience members to the system. In that way the system has knowledge about what the current audience is into. What someone is into could include their preferences regarding audience features based on their Spotify profile and their favourite or most hated tracks. This answers to the primary user needs of making the system something extraordinary and more valued than a human DJ because a human DJ can never know this information of every audience member. This also makes that the system is better than a human DJ in gathering information regarding audience appreciation, which is also a primary user need. It also answers to the secondary user needs of keeping balance between familiar and new music (because the system knows which tracks are familiar) and audience members wanting to express themselves - be it by providing information to a computer. It also values the DJ's desire to play similar tracks to what the audience is into and the desire to create a collective experience by means of a music set. This collective is generated because every audience member contributes a part to the feedforward of the system.
Feedback
The feedback of the system consists of sensor output. This is the part where the audience takes more control. The feedback sensor system detects how many persons are present on the dance floor, relative to the rest of the event area. This can be done by means of pressure sensors in the floor. This information cues whether the current music is appreciated or not. Another cue for appreciation of the music can be generated via active feedback of the crowd. For example, valence can be assessed by the public by means of technologies described in the state of the art section. Because audience feedback is not commonly used at music events, the primary user need of providing something special and extraordinary is answered. The ability for the crowd to give feedback answers the primary user need that the system should be better in gathering information regarding audience appreciation than a human DJ. This ability also lets the public as a whole have influence on the music, which creates a collective experience. Additionally, it answers the secondary user need of having control over the music being played. In that sense, the track selection procedure takes the audience reaction into account such that it comes up with tracks that are valued by the audience members. This also lets the public control whether the music is predictable or not, which should keep them engaged.
The other part of sensory feedback is the audience energy level. This energy level may include the activity or movement of the audience members. This can be measured passively by means of a wristband with different options for sensing activity or energy by means of heart rate, accelerometer data, sweat response or other options described in the state of the art section. The incorporation of audience energy level feedback answers to the primary user need of gathering information about the audience appreciation in a better way than a human DJ can. It also answers to the secondary user need of making the musical presentation reflect the audience energy level.
Below, a scheme is presented with all the user needs that call for feedback sensors.
The algorithm
Based on the feedforward and feedback of the crowd, the tracks to be played and their sequence are selected Template:Hoe gaan we dat doen? Nog geen volledig idee, want dat is ongeveer ons einddoel...
The next step in the algorithm is overlapping the tracks in the right way. Properly working algorithms that handle this task already exist. For example, the algorithm described in (Cliff, 2006). We will describe the working principles of that algorithm in this section. The overlap section is meant to seamlessly go from one track to the other. In the described technology the time set for overlap is proportional to the specified duration of the set and the number of tracks, making it a static time interval. Alterations to the duration of this interval are made when the tempo maps of the overlapping tracks produce a beat breakdown or when the overlap interval leads to an exceedance of the set duration.
If the system wants to play a next track in a smooth transition and there is any difference between the tempo (BPM) of the current track and the next track, time-stretching and beat-mapping need to be applied. Time-stretching means to slow down or bring up the tempo such that the tempo of the current and next track are nearly identical in order to produce a smooth transition to the next track. Technically speaking time-stretching is a (de)compression of time or changing the playback speed of the samples and applying proper interpolation in order to maintain sound quality. Once the tempos match, the beats of the two songs need to be aligned in order to acquire zero phase difference to avoid beat breakdown.
The last step is proper cross-fading. Although ramping down the volume of the current track while ramping up the volume of the next track is often sufficient for a good cross-fade, the algorithm described uses more sophisticated techniques to achieve proper cross-fading. The algorithm analyses the audio frequency-time spectograms of the two tracks to be cross-faded. This can be used to selectively suppress certain frequency components in the tracks such that current melodies seem to disappear and the next melody becomes more prominent. It can also filter out components of tracks that clash with each other, allowing for a smooth cross-fade (Cliff, 2006).
User interface
Pre-filter
Excitement prediction with multiple regression
In order to come up with a useable product we need to narrow down the scope of this project. We decided to focus on engineering a proper pre-filter and feedforward system for track selection, based on the Spotify audio features. We mainly focus on the features "energy" and "valence". Even after extensive research, no formal definition or formula was found for the Spotify feature "energy". Spotify itself describes it as a perceptual measure of intensity and activity based on dynamic range, perceived loudness, timbre, onset rate, and general entropy. It is a floating point number with a range between 0 and 1. The same holds for the "valence" feature; no formula can be found but Spotify describes it as a representation of the positivity or negativity of a track. It is a floating point number with a range of 0 to 1 where values close to 1 represent positive songs, whereas values close to 0 represent sad songs.
Continuing, we rated 152 songs on "excitement" and then performed a multiple regression analysis with the Spotify features "energy" and "valence" to come up with a prediction formula for "excitement" based on these features. We rated "excitement" such that it represents how enthusiastic the song is, how excited one gets by listening to it. It is a floating point number ranging from 0 to 1. Values close to 0 represent songs that will not get people enthusiastic, whereas values close to 1 represent songs that are very exciting and fosters enthusiasm among listeners. Please not that for now the feature "excitement" is completely made up by ourselves and inherently subjective in nature. However, we deemed "excitement" as it is defined now a good parameter for track selection that makes the audience happy. We picked songs from three different genres of dance music: Techno, Hardstyle and Disco. We wanted to stick to dance music, but to diversify our research we considered three distinct genres. The results of the regression analyses are presented in the proceeding sections.
Multiple regression per genre
The first step in our analysis was to do a multiple regression for every distinct genre in our database - being either Techno, Hardstyle or Disco - to see whether the formula for "excitement" differs between genres and whether it gives any significant results to start with.
Multiple regression techno
The multiple regression model considering techno only turned out significant, R-squared = 0.1, F(2, 60) = 3.39, p = 0.04. Here, "excitement" was based on the Spotify features "energy" and "valence". It means that for techno the appropriate formula for excitement is the following: ex = 0.39 + 0.183*en + 0.208*v. Where "ex" is excitement, "en" is energy and "v" is valence. The exact results are presented in the table.
excitement | Coefficient | Standard Error | t | p |
---|---|---|---|---|
energy | 0.183 | 0.116 | 1.57 | 0.121 |
valence | 0.208 | 0.096 | 2.11 | 0.039 |
constant | 0.390 | 0.100 | 3.91 | 0.000 |
Multiple regression hardstyle
The model for hardstyle turned out non-significant, R-squared = 0.05, F(2, 37) = 0.91, p = 0.41. It means that with this sample of hardstyle songs and their excitement ratings, no fitting formula is found by linear regression. The exact results are presented in the table.
excitement | Coefficient | Standard Error | t | p |
---|---|---|---|---|
energy | -0.171 | 0.154 | -1.11 | 0.276 |
valence | -0.057 | 0.063 | -0.91 | 0.371 |
constant | 0.800 | 0.142 | 5.61 | 0.000 |
Multiple regression disco
The model for disco turned out non-significant, R-squared = 0.1, F(2, 46) = 2.55, p = 0.09. It means that with this sample of disco songs and their excitement ratings, no fitting formula is found by linear regression. The exact results are presented in the table.
excitement | Coefficient | Standard Error | t | p |
---|---|---|---|---|
energy | 0.079 | 0.109 | 0.72 | 0.474 |
valence | -0.142 | 0.077 | -1.86 | 0.070 |
constant | 0.693 | 0.118 | 5.86 | 0.000 |
Multiple regression across genres
In the next step, we took all genres together and performed the same multiple regresion analysis on that dataset. This regression turned out significant, R-squared = 0.06, F(2, 149) = 4.99, p = 0.008. It means that there is a linear formula to predict "excitement" of a track based on the audio features "energy" and "valence" if you consider the three dance genres together. This formula is as follows: ex = 0.45 + 0.16*en + 0.07*v. If we generated this new predicted value from the formula, took the absolute difference between this predicted value and the "real" value for "excitement", the mean difference was 0.07 with a standard deviation of 0.06. All exact results can be found in the table below.
excitement | Coefficient | Standard Error | t | p |
---|---|---|---|---|
energy | 0.160 | 0.070 | 2.28 | 0.024 |
valence | 0.070 | 0.024 | 2.93 | 0.004 |
constant | 0.450 | 0.063 | 7.13 | 0.000 |
Discussion of the regression results
When we took each genre separately to come up with a prediction formula for "excitement", the results were not always promising. Only the regression for techno turned out significant. We think that this was mainly due to the small sample size (only 40, 49 and 63 songs used in every list) and the inherent subjective nature of the feature "excitement". Only one person rated this feature for every song making it a very subjective, personal variable. However, we should look beyond the results of this regression analysis only. In the future when our system is used at large music events, the variable "excitement" can be deduced in a much better way than by quantifying the subjective opinion of one person. One option could be to let every attender of a music festival rate about 10 relevant songs before attending the event. If 10,000 people do this the system can create 100,000 data points to base the regression on, instead of 152. One can imagine that this would give much better results than the provided analysis. Besides, when considering the regression analysis across genres we already came up with a model that has an average error of only 7%. One can imagine how small the error would become if 100,000 data points are used.
Another option could be to generate the "excitement" information in a more physical sense. For example based on information gathered from heart rate monitors and/or brain activity sensors as used by de Effenaar REFERENTIE NAAR UITWERKING GESPREK MET EFFENAAR. This information is more quantitative in nature than people's subjective opinion on "excitement" level of a track. So, maybe this could function as input that generates a more robust formula for "excitement".
Concluding, how the information on "excitement" is gathered is a point of discussion as well as an opportunity for improvement. What is most important is that a multiple regression model might be a good way to incorporate feedforward for track selection in our system. In that sense, the value of this particular research is not in the results of these regressions but in the method behind it.
Excitement matching with QET
System overview (NEEDS EDITING)
Below a graphic overview of the system is given. Important to note is that the grey area is the scope of our project; to design the rest is not feasible within our time budget. Besides, a lot of research is already done on the modules outside the grey region. Mixing tracks together perfectly is a desirable skill for all DJ's and therefore a skill for which a lot of DJ's seek help in the form of technology. Due to this high demand, a lot of properly working mixing algorithms already exist, for example (Cliff, 2006). Certain feedback sensors that measure excitement of the audience also already exist. A lot of wearable devices that measure all sorts of things like heart rate or motion already exist and have been used. One can look at one of the prior sections to gain information on this. Another real life example is (link naar uitwerkingou gesprek) De Effenaar that has used technology to measure brain activity and excitement among attenders of a music event. This lets us make the assumption that the mixing and the feedback part will work.
Globally, the system works as follows. The system has a large database to pick music from. This database contains for every song an "excitement" value that is based on a multiple regression output of the Spotify audio features "valence" and "energy". The music of this database is let through the pre-filter. The pre-filter filters the songs in the database on genre and SPOTIFY FEATURES (NOG BESLISSEN WELKE) based on the user input that is defined by the user interface. In that sense, the pre-filter outputs an array of tracks that is already filtered to the user's desire. Next, the filtered tracks will be matched with the desired QET (we will use an excitement graph instead of a tempo graph but the principle is the same). The result of this matching is a sequence of tracks that creates the playlist to be played for that evening (or morning?). This playlist is then fed to the mixing algorithm to make sure that the system outputs a nicely mixed set of music to enjoy. This music is rated on excitement by means of feedback sensors. This feedback is used to update the playlist via the excitement matching module. Thus, if the audience is not happy with the currently playing music the system can act upon that.
References
Atherton, W. E., Becker, D. O., McLean, J. G., Merkin, A. E., & Rhoades, D. B. (2008). U.S. Patent Application No. 11/466,176.
Barkhuus, L., & Jørgensen, T. (2008). Engaging the crowd: studies of audience-performer interaction. In CHI'08 extended abstracts on Human factors in computing systems (pp. 2925-2930).
Berkers, P., & Michael, J. (2017). Just what makes today’s music festivals so appealing?.
Choi, K., Cho, K. “Deep Unsupervised Drum Transcription”, 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019.
Choi, K., Fazekas, G., Cho, K., & Sandler, M. (2017). A tutorial on deep learning for music information retrieval. arXiv preprint arXiv:1709.04396.
Cliff, D. (2006). hpDJ: An automated DJ with floorshow feedback. In Consuming Music Together (pp. 241-264). Springer, Dordrecht.
De León, P. J. P., & Inesta, J. M. (2007). Pattern recognition approach for music style identification using shallow statistical descriptors. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37(2), 248-257.
Feldmeier, M. C. (2003). Large group musical interaction using disposable wireless motion sensors (Doctoral dissertation, Massachusetts Institute of Technology).
Feldmeier, M., & Paradiso, J. A. (2004, April). Giveaway wireless sensors for large-group interaction. In CHI'04 Extended Abstracts on Human Factors in Computing Systems (pp. 1291-1292).
Freeman, J. (2005) Large Audience Participation, Technology, and Orchestral Performance in Proceedings of the International Computer Music Conference, 2005, pp. 757–760.
Gates, C., & Subramanian, S. (2006). A Lens on Technology’s Potential Roles for Facilitating Interactivity and Awareness in Nightclub. University of Saskatchewan: Saskatoon, Canada.
Gates, C., Subramanian, S., & Gutwin, C. (2006, June). DJs' perspectives on interaction and awareness in nightclubs. In Proceedings of the 6th conference on Designing Interactive systems (pp. 70-79).
Greasley, A. E. (2017). Commentary on: Solberg and Jensenius (2016) Investigation of intersubjectively embodied experience in a controlled electronic dance music setting. Empirical Musicology Review, 11(3-4), 319-323.
Humphrey, E.J., Durand, S., McFee, B. “OpenMIC-2018: An open dataset for multiple instrument recognition”, 19th International Society for Music Information Retrieval Conference, Paris, France, 2018.
Hamel, P., & Eck, D. (2010, August). Learning features from music audio with deep belief networks. In ISMIR (Vol. 10, pp. 339-344).
Hödl, Oliver; Fitzpatrick, Geraldine; Kayali, Fares and Holland, Simon (2017). Design Implications for TechnologyMediated Audience Participation in Live Music. In: Proceedings of the 14th Sound and Music Computing Conference,
July 5-8 2017, Aalto University, Espoo, Finland pp. 28–34.
Hoffman, G., & Weinberg, G. (2010). Interactive Jamming with Shimon: A Social Robotic Musician. Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems, 3097–3102.
Huron, D. (2002). Music information processing using the Humdrum toolkit: Concepts, examples, and lessons. Computer Music Journal, 26(2), 11-26.
Jannach, D., Kamehkhosh, I., & Lerche, L. (2017, April). Leveraging multi-dimensional user models for personalized next-track music recommendation. In Proceedings of the Symposium on Applied Computing (pp. 1635-1642).
Johnson, D. (n.a.) Robot DJ Used By Nightclub Replaces Resident DJs. Retrieved on 09-02-2020 from http://www.edmnightlife.com/robot-dj-used-by-nightclub-replaces-resident-djs/
Kashino, K., Nakadai, K., Kinoshita, T., & Tanaka, H. (1995). Application of Bayesian probability network to music scene analysis. Computational auditory scene analysis, 1(998), 1-15.
McAllister, G., Alcorn, M., Strain, P. (2004) Interactive Performance with Wireless PDAs Proceedings of the International Computer Music Conference, 2004, pp. 1–4.
Pasick, A. (21 December 2015) The magic that makes Spotify's Discover Weekly playlists so damn good. Retrieved on 09-02-2020 from https://qz.com/571007/the-magic-that-makes-spotifys-discover-weekly-playlists-so-damn-good/
Pérez-Marcos, J., & Batista, V. L. (2017, June). Recommender system based on collaborative filtering for spotify’s users.In International Conference on Practical Applications of Agents and Multi-Agent Systems (pp. 214-220). Springer, Cham.
Shmulevich, I., & Povel, D. J. (1998, December). Rhythm complexity measures for music pattern recognition. In 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No. 98EX175) (pp. 167-172). IEEE.
Wen, R., Chen, K., Xu, K., Zhang, Y., & Wu, J. (2019, July). Music Main Melody Extraction by An Interval Pattern Recognition Algorithm. In 2019 Chinese Control Conference (CCC) (pp. 7728-7733). IEEE.
Yoshii, K., Nakadai, K., Torii, T., Hasegawa, Y., Tsujino, H., Komatani, K., Ogata, T. & Okuno, H. G. (2007, October). A biped robot that keeps steps in time with musical beats while listening to music with its own ears. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 1743-1750). IEEE.
Zhang, L., Wu, Y., & Barthet, M. (ter perse). A Web Application for Audience Participation in Live Music Performance: The Open Symphony Use Case. NIME. Geraadpleegd van https://core.ac.uk/reader/77040676