Group members: Nikola Milanovski, Senn Loverix, Illie Alexandru, Matus Sevcik, Gabriel Karpinsky

Work done last week


Name	Total	Breakdown
Nikola
Senn	12h	Studied papers (5h), wrote summaries (2h), Meeting (1.5h), Wrote Problem statement and objectives (2h), Wrote Users (1.5h)
Illie
Matus
Gabriel

Introduction and plan

Problem statement and objectives

The synthesizer has become an essential instrument in the creation of modern day music. They allow musicians to modulate and create sounds electronically. Traditionally, an analog synthesizer utilizes a keyboard to generate notes, and different knobs, buttons and sliders to manipulate sound. However, through using MIDI (Music Instrument Digital Interface) the synthesizer can be controlled via an external device, usually also shaped like a keyboard, and the other controls are made digital. This allows for a wide range of approaches to what kind of input device is used to manipulate the digital synthesizer. Although traditional keyboard MIDI controllers have reached great success, its form may restrict expressiveness of musicians that seek to create more dynamic and unique sounds, as well as availability to people that struggle with the controls due to a lack of keyboard playing knowledge or a physical impairment for example.

During this project the aim is to design a new way of controlling a synthesizer using the motion of the users’ hand. By moving their hand to a certain position in front of a suitable sensor system which consists of one or more cameras, various aspects of the produced sound can be controlled, such as pitch or reverb. Computer vision techniques will be implemented in software to track the position of the users’ hand and fingers. Different orientations will be mapped to operations on the sound which the synthesizer will do. Through the use of MIDI, this information will be passed to a synthesizer software to produce the electronic sound. We aim to allow various users in the music industry to seamlessly implement this technology to create brand new sounds in an innovative, easy to control way to create these sounds in a more accessible way than through using a more traditional synthesizer.

Users

With this innovative way of producing music, the main targets for this technology are users in the music industry. Such users include performance artists and DJ’s, which can implement this technology to enhance their live performances or sets. Visual artists and motions based performers could integrate the technology within their choreography to Other users include music producers looking to create unique sounds or rhythms in their tracks. Content creators that use audio equipment to enhance their content, such as soundboards, could use the technology as a new way to seamlessly control the audio of their content.

This new way of controlling a synthesizer could also be a great way to introduce people to creating and producing electronic music. It would be especially useful for people with some form of a physical impairment which could have restricted them from creating the music that they wanted before.

User requirements

For the users mentioned above, we have set up a list of requirements we would expect the users to have for this new synthesizer technology. First of all, it should be easy to set up for performance artists and producers so they don’t spend too much time preparing right before their performance or set. Next, the technology should be easily accessible and easy to understand for all users, both people that have a lot of experience with electronic music, and people that are relatively new to it.

Furthermore, the hand tracking should work in different environments. For example, a DJ that works in dimly lighted clubs who integrate a lot of different lighting and visual effects during their sets should still be able to rely on accurate hand tracking. There should also be the ability to easily integrate the technology into the artist’s workflow. An artist should not change their entire routines of performing or producing music if they want to use a motion based synthesizer.

Lastly, the technology should allow for elaborate customization to fit to each user’s needs. The user should be able to decide what attributes of the recognized hand gestures are important for their work, and which ones should be omitted. For example, if the vertical position of the hand regulates the pitch of the sound, and rotation of the hand the volume, the user should be able to ‘turn off’ the volume regulation so that if they rotate their hand nothing will change.

To get a better understanding of the user requirements, we are planning on interviewing some people in the music industry such as music producers, a DJ and an audiovisual producer. The interview questions are as follows:

Background and Experience: What tools or instruments do you currently use in your creative process? Have you previously incorporated technology into your performances or creations? If so, how?

Creative Process and Workflow: Can you describe your typical workflow? How do you integrate new tools or technologies into your practice? What challenges do you face when adopting new technologies in your work?

Interaction with Technology: Have you used motion-based controllers or gesture recognition systems in your performances or art? If yes, what was your experience?

How do you feel about using handgestures to control audio or visual elements during a performance?

What features would you find most beneficial in a handmotion recognition controller for your work?

Feedback on Prototype: What specific functionalities or capabilities would you expect from such a device?

How important is the intuitiveness and learning curve of a new tool in your adoption decision?

Performance and Practical Considerations: In live performances, how crucial is the reliability of your equipment? What are your expectations regarding the responsiveness and accuracy of motion-based controllers?

How do you manage technical issues during a live performance?

How important are the design and aesthetics of the tools you use?

Do you have any ergonomic preferences or concerns when using new devices during performances?

What emerging technologies are you most excited about in your field?

Approach, milestones and deliverables

Market research interviews with musicians, music producers etc.
- Requirements for hardware
- Ease of use requirements
- Understanding of how to seamlessly integrate our product into a musicians workflow.

Find software stack solutions
- Library for hand tracking
- Encoder to midi or another viable format.
- Synthesizer that can accept live inputs in chosen encoding format.
- Audio output solution
Find hardware solutions
- Camera/ visual input\
  - Multiple cameras
  - IR depth tracking
  - Viability of stander webcam laptop or otherwise
MVP (Minimal viable product)
- Create a demonstration product proving the viably of the concept by modifying a single synthesizer using basic hand gestures and a laptop webcam/ other easily accessible camera.
Test with potential users and get feedback
Refined final product
- Additional features
- Ease of use and integration improvements
- Testing on different hardware and software plaltforms
- Visual improvements to the software
- Potential support for more encoding formats or additional input methods other then hand tracking

Who is doing what?

Nikola - Interface with audio software

Senn - Hardware interface

Gabriel, Illie, Matus - Software processing of input and producing output

State of the art

[1] “A MIDI Controller based on Human Motion Capture (Institute of Visual Computing, Department of Computer Science, Bonn-Rhein-Sieg University of Applied Sciences),” ResearchGate. Accessed: Feb. 12, 2025. [Online]. Available: https://www.researchgate.net/publication/264562371_A_MIDI_Controller_based_on_Human_Motion_Capture_Institute_of_Visual_Computing_Department_of_Computer_Science_Bonn-Rhein-Sieg_University_of_Applied_Sciences

[2] M. Lim and N. Kotsani, “An Accessible, Browser-Based Gestural Controller for Web Audio, MIDI, and Open Sound Control,” Computer Music Journal, vol. 47, no. 3, pp. 6–18, Sep. 2023, doi: 10.1162/COMJ_a_00693.

[3] M. Oudah, A. Al-Naji, and J. Chahl, “Hand Gesture Recognition Based on Computer Vision: A Review of Techniques,” J Imaging, vol. 6, no. 8, p. 73, Jul. 2020, doi: 10.3390/jimaging608007

[4] A. Tagliasacchi, M. Schröder, A. Tkach, S. Bouaziz, M. Botsch, and M. Pauly, “Robust Articulated‐ICP for Real‐Time Hand Tracking,” Computer Graphics Forum, vol. 34, no. 5, pp. 101–114, Aug. 2015, doi: 10.1111/cgf.12700.

This article provides a method of capturing and tracking hand motions using only a depth camera providing RGBD data. The algorithm used attempts to fit a 3D model of a hand to the provided image through depth, silhouette and point clouds. The Iterative closest point (ICP) optimization works as follows: The correspondence of the perceived depth image is compared to a possible model for the hand, after which the model of the hand is adjusted to minimize the errors between them.

[5] A. Tkach, A. Tagliasacchi, E. Remelli, M. Pauly, and A. Fitzgibbon, “Online generative model personalization for hand tracking,” ACM Transactions on Graphics, vol. 36, no. 6, pp. 1–11, Nov. 2017, doi: 10.1145/3130800.3130830.

This paper discusses a real-time hand tracking method that does not require extensive calibration. This method uses learning the users hand shape dynamically as they move their hands in front of a regular depth camera. The algorithm used tracks the hand movements while updating the hand model to a more accurate one. The proposed system also handles uncertainty when certain positions of the hand are unable to be directly determined, and updates these values when more accurate measurements are made.

[6] T. Winkler, Composing Interactive Music: Techniques and Ideas Using Max. Cambridge, MA, USA: MIT Press, 2001.

[7] E. R. Miranda and M. M. Wanderley, New Digital Musical Instruments: Control and Interaction Beyond the Keyboard. Middleton, WI, USA: AR Editions, Inc., 2006.

[8] D. Hosken, An Introduction to Music Technology, 2nd ed. New York, NY, USA: Routledge, 2014. doi: 10.4324/9780203539149.

[9] P. D. Lehrman and T. Tully, "What is MIDI?," Medford, MA, USA: MMA, 2017.

[10] C. Dobrian and F. Bevilacqua, Gestural Control of Music Using the Vicon 8 Motion Capture System. UC Irvine: Integrated Composition, Improvisation, and Technology (ICIT), 2003.

[11] J. L. Hernandez-Rebollar, “Method and apparatus for translating hand gestures,” US7565295B1, Jul. 21, 2009 Accessed: Feb. 12, 2025. [Online]. Available: https://patents.google.com/patent/US7565295B1/en

[12] I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, “A brief introduction to OpenCV,” in 2012 Proceedings of the 35th International Convention MIPRO, May 2012, pp. 1725–1730. Accessed: Feb. 12, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/6240859/?arnumber=6240859

[13] K. V. Sainadh, K. Satwik, V. Ashrith, and D. K. Niranjan, “A Real-Time Human Computer Interaction Using Hand Gestures in OpenCV,” in IOT with Smart Systems, J. Choudrie, P. N. Mahalle, T. Perumal, and A. Joshi, Eds., Singapore: Springer Nature Singapore, 2023, pp. 271–282.

[14] V. Patil, S. Sutar, S. Ghadage, and S. Palkar, “Gesture Recognition for Media Interaction: A Streamlit Implementation with OpenCV and MediaPipe,” International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2023.

[15] A. P. Ismail, F. A. A. Aziz, N. M. Kasim, and K. Daud, “Hand gesture recognition on python and opencv,” IOP Conf. Ser.: Mater. Sci. Eng., vol. 1045, no. 1, p. 012043, Feb. 2021, doi: 10.1088/1757-899X/1045/1/012043.

[16] R. Tharun and I. Lakshmi, “Robust Hand Gesture Recognition Based On Computer Vision,” in 2024 International Conference on Intelligent Systems for Cybersecurity (ISCS), May 2024, pp. 1–7. doi: 10.1109/ISCS61804.2024.10581250.

[17] E. Theodoridou et al., “Hand tracking and gesture recognition by multiple contactless sensors: a survey,” IEEE Transactions on Human-Machine Systems, vol. 53, no. 1, pp. 35–43, Jul. 2022, doi: 10.1109/thms.2022.3188840.

This survey focusses on the problem of occlusion when using a single camera for hand tracking and gesture recognition, and how different methods can be used to solve this. RGB cameras provide high resolution color information and are cost efficient, but do not provide depth information of an image. Infra red (IR) sensors can capture very detailed hand movements, but are sensitive to other sources of IR light. Depth sensors/cameras such as RGB-D cameras can be used to construct an accurate depth image of the hand, but does cost more than an RGB camera, and sometimes does not reach the same resolution as an RGB camera. One methods to provide more accurate hand tracking is using wearable sensor. However, these sensors may be impractical or uncomfortable in certain scenarios. Another more accurate method would be to use multiple cameras (RGB/IR/Depth) to create more robust tracking and gesture recognition in various environments. The disadvantages of this increases both the system complexity and costs, and it would require synchronization between sensors.

[18] G. M. Lim, P. Jatesiktat, C. W. K. Kuah, and W. T. Ang, “Camera-based Hand Tracking using a Mirror-based Multi-view Setup,” IEEE Engineering in Medicine and Biology Society. Annual International Conference, pp. 5789–5793, Jul. 2020, doi: 10.1109/embc44109.2020.9176728.

This paper provides a method for hand tracking that uses a single color camera and mirrors to improve accuracy of hand tracking. The setup used uses a single mobile phone camera and two plane mirrors. An advantage of doing this is that it reduces the costs of needing multiple cameras. The system tries to detect the images using 21 keypoints distributed around the hand and a light-weight convolutional neural network. Next, various computer vision techniques are used to fit a hand model to the image. This technique results in less tracking errors compared to single-camera systems.

[19] P. Rahimian and J. K. Kearney, “Optimal camera placement for motion capture systems,” IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 3, pp. 1209–1221, Dec. 2016, doi: 10.1109/tvcg.2016.2637334.

This paper concerns optimal camera placement for Virtual and augmented reality. The main challenge is capturing all relevant markers in a 3D environment using two cameras. Various optimization techniques are used to find the best arrangement for these cameras (position/angle). It is important that each marker is visible by the two cameras. Optimizing this placement is crucial to guarantee accurate motion tracking.

PRE2024 3 Group15

Contents

Work done last week

Introduction and plan

Problem statement and objectives

Users

User requirements

Approach, milestones and deliverables

Who is doing what?

State of the art

Navigation menu

PRE2024 3 Group15

Work done last week

Introduction and plan

Problem statement and objectives

Users

User requirements

Approach, milestones and deliverables

Who is doing what?

State of the art

Navigation menu

Search