PRE2020 3 Group6: Difference between revisions
TUe\20160255 (talk | contribs) (→User) |
TUe\20160255 (talk | contribs) (→User) |
||
Line 51: | Line 51: | ||
The target user group of our software is people who are not able to express themselves with speech. There are different causes for people not being able to express themselves. People who are deaf or hard of hearing often cannot speak. There are also people who cannot speak due to physical disabilities. When a person cannot express themselves through speech, they are usually able to express themselves through sign language. Because of this non-verbal manner of expression, the target group is not able to function like a person who is able to speak in a meeting. This holds for both online and offline meetings. A direct solution for this is to hire an interpreter to translate sign language into speech, working as a proxy for the DHH person. An interpreter costs $18-$50 per hour, however, this can increase up to $125 an hour <ref name="Price interpreter"> [https://www.universal-translation-services.com/how-much-does-a-sign-language-interpreter-cost/] How Much Does A Sign Language Interpreter Cost | UTS. (n.d.) Retrieved March 25, 2021.</ref>. The average hourly wage in the USA is around $35 <ref name="Hourly wage"> [https://www.bls.gov/news.release/empsit.t19.htm] Table B-3. Average hourly and weekly earnings of all employees on private nonfarm payrolls by industry sector, seasonally adjusted. Retrieved March 25, 2021.</ref>, meaning that DHH individuals that need to pay for interpretation out of pocket may not be able to do so. Furthermore, individuals that need this extra help, may be less attractive financially to employers because of the extra costs. The proposed software would eliminate the need for interpreters, and therefore all the aforementioned problems linked to this. Online meetings would become more inclusive for individuals that use sign-language as the software would allow those individuals to participate actively. | The target user group of our software is people who are not able to express themselves with speech. There are different causes for people not being able to express themselves. People who are deaf or hard of hearing often cannot speak. There are also people who cannot speak due to physical disabilities. When a person cannot express themselves through speech, they are usually able to express themselves through sign language. Because of this non-verbal manner of expression, the target group is not able to function like a person who is able to speak in a meeting. This holds for both online and offline meetings. A direct solution for this is to hire an interpreter to translate sign language into speech, working as a proxy for the DHH person. An interpreter costs $18-$50 per hour, however, this can increase up to $125 an hour <ref name="Price interpreter"> [https://www.universal-translation-services.com/how-much-does-a-sign-language-interpreter-cost/] How Much Does A Sign Language Interpreter Cost | UTS. (n.d.) Retrieved March 25, 2021.</ref>. The average hourly wage in the USA is around $35 <ref name="Hourly wage"> [https://www.bls.gov/news.release/empsit.t19.htm] Table B-3. Average hourly and weekly earnings of all employees on private nonfarm payrolls by industry sector, seasonally adjusted. Retrieved March 25, 2021.</ref>, meaning that DHH individuals that need to pay for interpretation out of pocket may not be able to do so. Furthermore, individuals that need this extra help, may be less attractive financially to employers because of the extra costs. The proposed software would eliminate the need for interpreters, and therefore all the aforementioned problems linked to this. Online meetings would become more inclusive for individuals that use sign-language as the software would allow those individuals to participate actively. | ||
Finally, our product will enable the user to express themselves using not only text but also their facial expressions. This will help the user to function in meetings more easily. | Besides this, people who work with, or are related to, DHH individuals are also members of our target group. Communication between DHH and non-DHH people can be difficult for both parties. Facilitating this communication either requires both persons involved to speak some form of sign language or both using text. Some DHH individuals can also Neither of these options seem realistic in a professional meeting environment. Using only text is extremely slow and inefficient, but is accessible to everyone. On the other hand, sign language is much more responsive, but not everyone has access to it. While our software does not directly affect this side of the target group, it does allow for more inclusivity of DHH people in the professional world, particularly in the current state of working life due to the COVID crisis. | ||
Finally, our product will enable the user to express themselves using not only text but also their facial expressions. This will help the user to function in meetings more easily. In the past 10 years, the number of people working from home, teleworking, has increased slowly, depending on the sector and occupation. However, since the start of the COVID-19 pandemic, it has been estimated that around 40% of working individuals in the EU switched to working from home full-time <ref name="Teleworking EU"> [https://ec.europa.eu/jrc/sites/jrcsh/files/jrc120945_policy_brief_-_covid_and_telework_final.pdf] European Commission. (2020). Telework in the EU before and after the COVID-19 : where we were , where we head to. Science for Policy Briefs, 2009, 8. </ref>. It has also been suggested that this shift in work culture is something that could stay, even after the COVID-19 pandemic. Therefore, by allowing individuals that rely on sign-language to communicate naturally, teleworking will become as easy for them as anyone else. Which could also eventually lead to more possibilities in the job market as well. | |||
Hoe veel mensen zijn doof? Hoe veel werken online? Dus hoeveel zouden hier profijt van hebben? | Hoe veel mensen zijn doof? Hoe veel werken online? Dus hoeveel zouden hier profijt van hebben? |
Revision as of 10:16, 25 March 2021
Sign to text software
Group Members
Name | Student ID | Department | Email address |
---|---|---|---|
Ruben Wolters | 1342355 | Computer Science | r.wolters@student.tue.nl |
Pim Rietjens | 1321617 | Computer Science | p.g.e.rietjens@student.tue.nl |
Pieter Michels | 1307789 | Computer Science | p.michels@student.tue.nl |
Sterre van der Horst | 1227255 | Psychology and Technology | s.a.m.v.d.horst1@student.tue.nl |
Sven Bierenbroodspot | 1334859 | Automotive Technology | s.a.k.bierenbroodspot@student.tue.nl |
Problem Statement and Objective
At the moment 466 million people suffer from hearing loss, it has been predicted that this number will increase to 900 million by 2050. Hearing loss has, among other things, a social and emotional impact on one's life. The inability to communicate easily with others can cause an array of negative emotions such as loneliness, feelings of isolation, and sometimes also frustration [1]. Although there are many different types of speech recognition technologies for live subtitling that can help people that are deaf or hard of hearing, hereafter referred to as DHH, these feelings can still be exacerbated during online meetings. DHH individuals must concentrate on the person talking, the interpretation, and of any potential interruptions that can occur [2]. Furthermore, to be able to take part in the discussion, they must be able to spontaneously react in conversation. However, not everyone understands sign language, which makes communicating even more difficult. Nowadays, especially due to the COVID-19 pandemic, it is becoming more normal to work from home and therefore the number of online meetings is increasing quickly [3]. This leads us to our objective: to develop software that translates Sign Language to text to help DHH individuals communicate in an online environment. This system will be a tool that DHH individuals can use to communicate during online meetings. The number of people that have to work or be educated from home has rapidly increased due to the COVID-19 pandemic [4]. This means that the number of DHH individuals that have to work in online environments also increases. Previous studies have shown that DHH individuals obtain a lower score on an Academic Engagement Form for communication compared to students with no disability [5]. This finding can be explained by the fact that DHH people are usually unable to understand speech without aid. This aid can be a hearing aid, technology that converts speech to text, or even an interpreter, however the latter is expensive and not available for most DHH individuals. To talk to or react to other people, DHH individuals can use pen and paper, or in an online environment by typing. However, this is a lot slower than speech or sign language which makes it almost impossible for DHH individual to keep up with the impromptu nature of discussions or online meetings [6]. Therefore, by creating software that can convert sign language to text, or even to speech, DHH individuals will be able to actively participate in meetings. To do this, it is important to understand what sign language is. The following section of this wiki page, will explain the different elements of sign language and what it is.
Sign Language: what is it?
Sign language is a natural language that is predominantly used by people who are deaf or hard of hearing, but also by hearing people as well. Of all the children who are born deaf, 9 out of 10 are born to hearing parents. This means that the parents often have to learn sign language alongside the child [7].
Sign language is comparable to spoken language in the sense that it differs per country. American Sign Language (ASL) and British Sign Language (BSL) were developed separately and are therefore incomparable, meaning that people that use ASL will not necessarily be able to understand BSL [7].
Sign language does not express single words, it expresses meanings. For example, the word right has two definitions. It could mean correct, or opposite of left. In spoken English, right is used for both meanings. In sign language, there are different signs for the different definitions of the word right. A single sign can also mean a whole entire sentence. By varying the hand orientation and direction, the meaning of the sign, and therefore the sentence, changes [8].
Having said that, all sign languages rely on certain parameters, or a selection of these parameters, to indicate meaning. These parameters are [9]:
- Handshape: the general shape one's hands and fingers make;
- Location: where the sign is located in space; body and face are used as reference points to indicate location;
- Movement: how the hands move;
- Number of hands: this naturally refers to how many hands are used for the sign, and it also refers to the ‘relationship of the hands to each other’ ;
- Palm orientation: this is how the forearm and wrist rotate when signing;
- Non-manuals: this refers to the face and body. Facial expressions can be used for different meanings or lexical distinctions. They can also be used to indicate mood, topics, and aspects.
According to the study by Tatman, the first three parameters are universal in all sign languages. However, using facial expressions for lexical distinctions is something that is not used in most languages. The use of parameters also depends on the cultural and cognitive context and feasibility of those parameters [9].
USE ## ONDERBOUWEN MET STATISTIEKEN
In this section, we will highlight how each of the aspects of USE relates to our system.
User
The target user group of our software is people who are not able to express themselves with speech. There are different causes for people not being able to express themselves. People who are deaf or hard of hearing often cannot speak. There are also people who cannot speak due to physical disabilities. When a person cannot express themselves through speech, they are usually able to express themselves through sign language. Because of this non-verbal manner of expression, the target group is not able to function like a person who is able to speak in a meeting. This holds for both online and offline meetings. A direct solution for this is to hire an interpreter to translate sign language into speech, working as a proxy for the DHH person. An interpreter costs $18-$50 per hour, however, this can increase up to $125 an hour [10]. The average hourly wage in the USA is around $35 [11], meaning that DHH individuals that need to pay for interpretation out of pocket may not be able to do so. Furthermore, individuals that need this extra help, may be less attractive financially to employers because of the extra costs. The proposed software would eliminate the need for interpreters, and therefore all the aforementioned problems linked to this. Online meetings would become more inclusive for individuals that use sign-language as the software would allow those individuals to participate actively.
Besides this, people who work with, or are related to, DHH individuals are also members of our target group. Communication between DHH and non-DHH people can be difficult for both parties. Facilitating this communication either requires both persons involved to speak some form of sign language or both using text. Some DHH individuals can also Neither of these options seem realistic in a professional meeting environment. Using only text is extremely slow and inefficient, but is accessible to everyone. On the other hand, sign language is much more responsive, but not everyone has access to it. While our software does not directly affect this side of the target group, it does allow for more inclusivity of DHH people in the professional world, particularly in the current state of working life due to the COVID crisis.
Finally, our product will enable the user to express themselves using not only text but also their facial expressions. This will help the user to function in meetings more easily. In the past 10 years, the number of people working from home, teleworking, has increased slowly, depending on the sector and occupation. However, since the start of the COVID-19 pandemic, it has been estimated that around 40% of working individuals in the EU switched to working from home full-time [12]. It has also been suggested that this shift in work culture is something that could stay, even after the COVID-19 pandemic. Therefore, by allowing individuals that rely on sign-language to communicate naturally, teleworking will become as easy for them as anyone else. Which could also eventually lead to more possibilities in the job market as well.
Hoe veel mensen zijn doof? Hoe veel werken online? Dus hoeveel zouden hier profijt van hebben?
Society
The deaf, hard of hearing and mute communities are a societal stakeholder. Besides this, the government, employers, and educational institutions are also stakeholders.
The deaf, hard of hearing and mute communities will be impacted by this as they will directly benefit from this software. It will increase inclusivity, which is an important aspect.
The government, employers, and educational institutions are affected in the sense that they can live up to any inclusivity policies they have. As mentioned before, communicating online can be difficult for people that use sign language. The inability of those individuals to respond (quickly) decreases inclusivity. This software would increase inclusivity again, which is important to stakeholders such as the government, employers, and educational institutions.
Wat is de impact hiervan op society?
DHH individuals kunnen meer thuis werken <-- COVID-19=meer thuis werken
Samenwerking bevorderen dus negatieve preconcieved notions tegengaan
Enterprise
As mentioned previously, an important aspect of the software is the integration into a professional work environment. Since most work-related meetings happen online nowadays, it is only natural that through the use of technology we provide as much opportunity for people who are usually unable to be involved in such a line of work to now do so. An especially common platform for online meetings is Microsoft Teams, which is used in over 500,000 organizations to facilitate meetings[13]. Within this platform, there is the possibility of integrated apps. The vision we have for our software is to then create such an integration with Microsoft Teams to allow DHH to more easily take part in discussions.
Hoe veel efficienter communiceren ze? Hoe veel tijd en geld zou dit dan schelen?
Vergelijkbare software
Design concept
The user will participate in an online meeting using a camera with a sufficient quality. The video feed of the user will be extracted by the sign to text software. It will be able to detect when it should be extracting frames through the detection of a hand in the screen or through manual activation. The extracted frames will be used to compute the output using a neural network. The neural network will be trained using a dataset of sign language. It will be trained to detect single letters and numbers. The program will take the frames that are input and match it with one of the letters or numbers.
The output of the program is the sequence of letters and numbers the user has signed. The output will be displayed like subtitles would be. The delay of these subtitles will have to be as low as possible since this will increase the effect of facial expressions in their sentences.
State of the art
Research into sign language, luckily, has already been done many times before this project. The challenges that previous investigations have run into are well documented and defined, allowing us to work around them and being able to identify them early in the project. Furthermore, due to the large amount of research already done into sign language recognition it is easier for us to identify what methods to pick for this project.
In this state-of-the-art section the most up to date findings will be shortly discussed. First, the challenges that this project can face are considered. Second, several different approaches that have been taken are reviewed. Finally, out of all this information the method that has been chosen for this project will be explained.
Challenges
The first and most obvious challenge originates from sign language itself. Signs are often not identifiable based on one frame or image. Almost all signs require some sort of motion which is simply not possible to capture in an image. Therefore, whatever classifier is decided on for this project, it is most likely going to have to be able to identify signs from video, which is significantly harder than recognizing signs from just one image. Classifying signs from video, although much more difficult than classifying from images, is not impossible. An example of a successful sign language recognizer is given by Pu, Junfu, Zhou, Wengang and Li, Houqiang in their 2019 paper “Iterative Alignment Network for Continuous Sign Language Recognition” [14]. In this paper a method is proposed to map some T number of frames to an amount L of words or signs. Sadly, this method combines several highly advanced neural network techniques, which are for sure out of the scope of this course. Furthermore, this method is designed specifically for use with 3D videos, which is something users of this product might not have access to.
Furthermore, signs can be executed at different speeds. Obviously, there will be marginal differences between every other signer, but a significant speed difference in the execution of a sign can also carry some meaning. For instance, by doing the sign for running much faster than normal the signer can convey that he had to run very fast to get somewhere.
One more issue that sign language recognizers can run into is the use of depth in signs. A good example of a sign where this can become problematic is the sign for ‘gray’ in American Sign Language (ASL), which is done as shown in this video: https://www.signingsavvy.com/search/gray. Notice how the signer must wave his or her hands through each other for a few seconds. Using a 2D camera this depth is impossible to capture. Evidently, requiring users of this product to have a 3D camera or a setup with multiple cameras is not a feasible option.
Moreover, signs can look very much like each other. Again, the sign for ‘gray’ comes to mind. The signs for both ‘gray’ and ‘whatever’ are very similar (see this link for how ‘whatever’ can be signed: https://www.signingsavvy.com/search/whatever). Both signs have the signer moving their hands back and forth with very little difference in hand gesture. A way around this problem can be found in the facial expression of the signer. As shown by U. von Agris, M. Knorr and K. Kraiss the facial expressions that are made during signing are key to recognizing the correct sign [15]. This is also something that can be seen if attention is paid to the face of the signers in the videos for the ‘gray’ and ‘whatever’ signs. During execution of the sign, it is almost like the signer is saying the word out loud. The only difference of course being that no sound is made. However, as shown in the paper, this is not the only facial feature that can help with classification. As shown in the paper, the facial expression of the signer can change the meaning of a sign. Therefore, to make even better sign language recognition software, the facial expressions should be considered as well.
The fact that signs can look very much alike is not really an issue that is specific to this project. In general, for all neural networks it will be very hard to classify two images that are very similar. Even when adding another dimension this problem is not easy to solve for hand gestures, as shown by L. K. Phadtare, R. S. Kushalnagar and N. D. Cahill [16]. In this paper an algorithm is proposed to detect hand gestures using the Kinect, a camera that has depth perception. The algorithm is shown to be incredibly accurate in distinguishing hand gestures that are very different from one another. However, the authors also show that the smaller the difference between gestures, the less accurate the classification will be.
However, the fact that the face can impact the meaning of signs so much means that the classifier should look at more than just the hands, adding another layer of difficulty to this already hard problem. Luckily, this is an issue that has already been looked at before by K. Imagawa, Shan Lu and S. Igi [17]. In this paper methods to distinguish specifically the face and hands from a 2D image are discussed. The idea of the proposed solution is to use a color mapping based on the skin color of the signer to identify hand and face ‘blobs’ in the image. Although the findings in this paper are very interesting, the implementation difficulty and time must be considered.
Another thing that might be problematic specifically for this project is the fact that the background used in the images or videos that need to be classified can be very cluttered. This can lead to the classifier not being able to distinguish between the hands of the signer and the background, leading to wrong classifications. Although in meetings it is usually customary to have a clear background, this is not something that can be expected of potential users of this product.
Approaches
To make sign language recognizers several approaches have been taken throughout the years. Some of these approaches have been highlighted by Cheok, M.J., Omar, Z. and Jaward, M.H. in their article reviewing existing hand gesture and sign recognition techniques [18]. In this article the several approaches have been split into two major classes: vision and sensor based.
Vision based
Vision based sign language recognition, as the name implies, uses videos or images acquired using one or more cameras and possibly extended with extra more sophisticated techniques.
The easiest approach – to set up at least - in the vision-based category is the one using only one camera. This would result in some recognizer classifying signs based on 2D images or videos. However, this one camera could be extended to use a combination of several cameras in order to give the data representation some sense of depth.
Both approaches can be expanded on using markers to more easily identify and even track hands. Examples of this are mentioned in the article by Cheok, M.J., Omar, Z. and Jaward, M.H. as well: signers can be asked to wear two differently colored gloves or little wristbands. These colored gloves for instance were used in the paper mentioned earlier by K. Imagawa, Shan Lu and S. Igi [19].
Sensor based
Sensor based sign language recognition although very promising and interesting is also much harder to make and to set up. Sensor based approaches require the measuring of something using one or more sensors or instruments. These sensors could be used to track for instance the position of the hands during signing, resulting in a more abstract ‘image’ of where the hands are during a sign. However, the sensors could also be used to measure things like velocity and acceleration, which as discussed earlier, is probably very helpful since most signs can not be made by holding your hands in one position. One step further could mean biological signals, like the number of electrical pulses through the muscles required to make some sign.
Our solution
The solution that was used in this project mostly came down to what can be expected from users of the product. The goal of this project is a sign language to text interpreter so deaf people can participate in online meetings more easily. Since it just can not be expected of users to have expansive 3D cameras or to get all kinds of equipment before the classifier works it seemed kind of obvious to go for the easiest to implement solution: the product will be convolutional neural network that can classify 2D images to the correct signs. Do note that this being the ‘easiest’ solution does not imply that it will work very well, in fact, due to the lack of information caused by only having access to a 2D image or video feed the product might actually end up being very bad at recognizing sign language.
Technical specifications
Classifier part: To accurately detect what the user wants to convey from video footage of them using sign language we have decided to look into making classifier models. In this scenario a classifier model should correctly classify certain frames of the video footage to contain the signed language actually within that video footage.
To make this classifier we have opted to use neural networks. To build a classifier you will need a dataset containing labelled images for the classes which the classifier is supposed to detect. This data is then divided in train, test and validation datasets where no 2 sets overlap. The train data set is the data on which the models of the neural network is trained, the validation set is used to compare performances between models and the test dataset is used to test the accuracy of the eventual generated models of the neural network.
For sign language we have found two distinctly different types of sign language which require different approaches in how to make an effective classifier for them. Signs which consist of a single hand position and do not contain movement, such as the sign for the number 1, and signs throughout which the hands appear in different positions and thus naturally contain movement, such as the sign for butterfly. The main difference for the classifiers for these 2 types of signs is that all the information for the first type of signs can be detected from a single frame, while for the second type of signs you will require multiple frames to relay all the relevant information regarding the sign. As such henceforth we will refer to these types of signs as single-frame and multiple-frame signs.
Building a classifier for strictly single-frame signs is (considerably) easier as the classifier only needs to look at a single frame. The input of the classifier will consist of a single image file with consistent resolutions. As such there is by default uniformity within the data and there is little need for pre-processing of the data. When inputted into the neural network, the image file will be turned by the computer as a matrix of pixels. For the images which we used for the single-frame signs for classifying alphabet signs, we used images of 200X200 pixels which were scaled grey. Scaling the images grey means the input for the neural network and its models will consist of a single matrix of 200 by 200 values. If we were to have used coloured images, then an image would consist of 3 layers of 2 dimensional matrices, one for each colour in RGB (Red, Green, Blue).
To create the classifier for single-frame signs, we used a neural network with convolution layers and max pooling layers. The convolution layers in the neural network convolute a group of values in the matrix by applying a kernel on the group of values and returning a single value for the next layer of the neural network. The kernel can be for example 3X3 pixels and give only count the left most pixels and the centre pixel, the resulting value of this kernel would give the summed up values of the counted pixels. The kernel goes over all groups of the matrix and returns a new matrix with new values, the new layer of the neural network (this does not decrease the amount of values in the matrix, it would still be 200X200 as the values in the groups are not exclusive). In this way the convolutional layers are meant to extract high-level features from the images, such as where the edges of images or critical sections of images are. In addition to this max pooling layers are used to decrease the size of the matrices. A max pooling layer runs a kernel over a matrix but with a larger stride (the distance between the placement of the kernel) so less values are outputted as less groups of pixels are inspected. Max pooling simply returns the largest value in a kernel, which is meant to summarize the information in a group of pixels in a single pixel to smoothen out the layers in the neural network.
After the convolution layers and max pooling layers have been applied to the original matrix representing the inputted image, a flattening layer is applied to turn the matrix into a single vector of values. On this single vector which now represents all the relevant information from the original image, we apply fully connected layers (also known as dense layers) which predicts the correct label for the inputted image. The fully connected layers apply weight to the values in the inputted vector and calculate the predicted probabilities for each class within our classifier. In the first few fully connected layers the ReLU (Rectified Linear Units) activation function is used to reduce the size of the vector. In the last fully connected layer which outputs the results for all the classes the softmax activation function is used to normalize the vector to output vectors between 0 and 1, denoting the probabilities for the inputted image to be each class.
Realization
Testing
Test plan
Scope
The first scenario to be tested will be the 'out of context' scenario. In this situation we will test the software by making sure everything is set up correctly for a person to make some signs. There will be no other speech or discussions during this scenario. This scenario will be used to validate if the software will actually function as expected. There will be as few as possible variables considered.
The second scenario is the real world testing. In this scenario the group members will start a discussion, then one of the members will express itself through sign language. This scenario is to test whether the software correctly realises when it will need to translate, but also to test if there is not too much delay to keep the conversation alive.
Out of scope
We are not going to cover the actual implementation in MS teams. We are also not going to cover the multiple-frame signs. We will also not include subject who actually suffer from not being able to talk.
Assumptions
Schedule
preparation: Subject person has to have to program installed. The camera of the subject needs to be of sufficient quality with sufficient lighting. The subject has to know the signs of the letters of its name and its age.
Test 1: Camera responsible starts a recording of the screen. The subject will also start a recording. Both recordings also need to include a clock with the accurate time. The software is prepared to translate and in the case this is necessary the subject will indicate the start of each sign to the software. The subject will sign its name and its age. The output of the program will be logged with timestamps.
Test 2: Camera responsible starts a recording of the screen. The subject will also start a recording. Both recordings also need to include a clock with the accurate time. An observer will ask the subject what its name and age is. The subject will respond in sign language. The output of the program will be logged with timestamps.
Roles and responsibilities
Subject: Camera responsible: observer(s):
Tools
Exit criteria
Design evaluation
Week 1
Week 1 mostly consisted of putting together a group and decide upon a topic. We settled on the topic of emotion recognition on children with ASD. Research has been done on this topic and references to similar projects have been gathered. The focus for next week is to explore what is possible to achieve within this topic.
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | 8,5 | meeting with group - deciding subject (1h 30m), gathering and reading sources (2h 30m), summarizing and further reading of sources (4h 30m) |
Sterre van der Horst | 1227255 | 10 | preparing group meeting (30m), meeting with group - deciding subject (1h 30m), finding relevant sources (2h 30m), summarizing and reading sources (4h 30m), finding more sources (1h) |
Pieter Michels | 1307789 | 8,5 | meeting with group - deciding subject (1h 30m), gathering and reading sources (2h), summarizing and further reading of sources (5h) |
Pim Rietjes | 1321617 | 8,5 | meeting with group - deciding subject (1h 30m), gathering and reading sources (3h), summarizing and further reading of sources (4h) |
Ruben Wolters | 1342355 | 8,5 | meeting with group - deciding subject (1h 30m), gathering and reading sources (3h), summarizing and further reading of sources (4h) |
Week 2
In week 2 we decided after discussing the possible deliverables and came to the conclusion that it is difficult to find a dataset which we could use. The creation of a dataset is nearly impossible due to the slim target group and the current corona measures. for these reasons we abandoned the subject and discussed a new topic. The selected topic is to develop software which can convert sign language into text using video as an input.
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | 7 | meeting with supervisor (1h), ideation for new topics (2h), meeting deciding on new subject (1h), reading about new subject (3h) |
Sterre van der Horst | 1227255 | 13.75 | preparing meeting with supervisor (45m), meeting with supervisor (1h), finding new sources about ASD (3h), analyzing new sources (2h), meeting deciding new subject (1h), finding new sources about new subject (3h), summarizing new sources (3h) |
Pieter Michels | 1307789 | 10 | meeting with supervisor (1h), reading on old subject (2h), looking for databases on old subject (2h), meeting deciding on new subject (1h), reading about new subject (4h) |
Pim Rietjes | 1321617 | 11.5 | meeting with supervisor (1h), reading on old subject (3h), looking for databases on old subject (3h), meeting deciding on new subject (1h), looking at databases for new subject (3.5h) |
Ruben Wolters | 1342355 | 9 | meeting with supervisor (1h), reading on old subject (2h), looking for databases on old subject (2h), meeting deciding on new subject (1h), looking at databases for new subject (2h) |
Week 3
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | 6 | meeting with supervisor (1h), introducing wiki structure (1h), reading and summarizing sources (4h) |
Sterre van der Horst | 1227255 | 10.5 | preparing meeting with supervisor (30m), meeting with supervisor (1h), finding and reading new sources (5h), writing introduction and problem statement (2h), rewriting problem statement/introduction and adding to wiki (2h) |
Pieter Michels | 1307789 | 11 | meeting with supervisor (1h), reading and summarizing sources (4h), setting up coding environment (3h), Getting familiar with Tensorflow (3h) |
Pim Rietjes | 1321617 | 10 | meeting with supervisor (1h), looking into example classifiers (2h), downloading and exploring datasets (3h), setting up coding environment (3h), getting familiar with Tensorflow (1h) |
Ruben Wolters | 1342355 | 5 | meeting with supervisor (1h), setting up coding environment (3h), getting familiar with Tensorflow (1h) |
Week 4
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | 5 | research sign language (1h), group meeting (30m), research into user (3h), user text on wiki (30m) |
Sterre van der Horst | 1227255 | 6.5 | research sign language (2h), writing section about what is sign language (2h), group meeting (30m), adding references to previously written problem statement and what is sign language (30m), first draft questionnaire (1.5h) |
Pieter Michels | 1307789 | 7,5 | meeting with supervisor (1h), adding timetables to wiki (30m), investigate into tensorflow and Keras (2h 30m), group meeting (30m), Working on classifier (3h) |
Pim Rietjes | 1321617 | 12,5 | meeting with supervisor (1h), looking into example classifiers (2h), downloading and exploring datasets (1h), preprocessing data (4h), group meeting (30m), Working on classifier (4h) |
Ruben Wolters | 1342355 | 4,5 | meeting with supervisor (1h), group meeting (30m), working on classifier (3h) |
Week 5
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | 6,5 | Video for different display possibilities (5h), Flowchart global design specs (1h 30m) |
Sterre van der Horst | 1227255 | description | |
Pieter Michels | 1307789 | 10h | meeting with supervisor (1h), worked on the preprocessor for the classifier (5h), making sure preprocessor works (1h), working on state of the art section - which then got deleted for some unkown reason so I have to do it again :))))) (3h) |
Pim Rietjes | 1321617 | 13.5h | meeting with supervisor (1h), writing text on the classifier (2.5h), calculating optical flow (moving signs) (4h), working on preprocessor (1h), working on neural network (static signs) (2h), working on neural network (moving signs) (3h) |
Ruben Wolters | 1342355 | 3 | meeting with supervisor (1h), working on preprocessor (2h) |
Week 6
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | 8h | Meeting with supervisor (1h), Design concept writing (2h), User writing (2h), Test plan writing (3h) |
Sterre van der Horst | 1227255 | description | |
Pieter Michels | 1307789 | 10h | Meeting with supervisor (1h), rewriting/checking the state-of-the-art section (5h), investigating how to integrate classifier with teams (4h) |
Pim Rietjes | 1321617 | 13.5h | Meeting with supervisor (1h), working on single frame classifier/demo(2h 30m), fixing bugs/errors in multiple frame CNN (5h), creating classifier model with mulitiple frame CNN (3h), working on demo multiple frame CNN (1h), writing text technical spec (1h) |
Ruben Wolters | 1342355 | 5h | Meeting with supervisor (1h), working on classifier (2h 30m), proof-reading and correcting wiki sections (1h 30m) |
Week 7
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | description | |
Sterre van der Horst | 1227255 | description | |
Pieter Michels | 1307789 | description | |
Pim Rietjes | 1321617 | description | |
Ruben Wolters | 1342355 | description |
Week 8
Name | Student ID | Hours | Description |
---|---|---|---|
Sven Bierenbroodspot | 1334859 | description | |
Sterre van der Horst | 1227255 | description | |
Pieter Michels | 1307789 | description | |
Pim Rietjes | 1321617 | description | |
Ruben Wolters | 1342355 | description |
References
- ↑ [1] Deafness and Hearing Loss - World Health Organization. (2021) WHO.
- ↑ [2] Peruma, A., & El-Glaly, Y. N. (2017). CollabAll: Inclusive discussion support system for deaf and hearing students. ASSETS 2017 - Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility, 315–316.
- ↑ [3] Microsoft Teams reaches 115 million DAU—plus, a new daily collaboration minutes metric for Microsoft 365 - Microsoft 365 Blog. (2021).
- ↑ [4] European Commission. (2020). Telework in the EU before and after the COVID-19 : where we were, where we head to. Science for Policy Briefs, 2009, 8 .
- ↑ [5] Richardson, J. T. E., Long, G. L., & Foster, S. B. (2004). Academic engagement in students with a hearing loss in distance education. Journal of Deaf Studies and Deaf Education, 9(1), 68–85.
- ↑ Glasser, A., Kushalnagar, K., & Kushalnagar, R. (2019). Deaf, Hard of Hearing, and Hearing perspectives on using Automatic Speech Recognition in Conversation. ArXiv, 427–432.
- ↑ 7.0 7.1 [6]Scarlett, W. G. (2015). American Sign Language. The SAGE Encyclopedia of Classroom Management.
- ↑ Perlmutter, D. M. (2013). What is Sign Language ? Linguistic Society of America, 6501(202).
- ↑ 9.0 9.1 [7] Tatman, R. (2015). The Cross-linguistic Distribution of Sign Language Parameters. Proceedings of the Annual Meeting of the Berkeley Linguistics Society, 41(January).
- ↑ [8] How Much Does A Sign Language Interpreter Cost | UTS. (n.d.) Retrieved March 25, 2021.
- ↑ [9] Table B-3. Average hourly and weekly earnings of all employees on private nonfarm payrolls by industry sector, seasonally adjusted. Retrieved March 25, 2021.
- ↑ [10] European Commission. (2020). Telework in the EU before and after the COVID-19 : where we were , where we head to. Science for Policy Briefs, 2009, 8.
- ↑ [11] David Curry, BussinessofApps.com, Microsoft Teams Revenue and Usage Statistics (2021)
- ↑ Pu, Junfu & Zhou, Wengang & Li, Houqiang. (2019). Iterative Alignment Network for Continuous Sign Language Recognition. 4160-4169. https://doi.org/10.1109/CVPR.2019.00429
- ↑ U. von Agris, M. Knorr and K. Kraiss, "The significance of facial features for automatic sign language recognition," 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, Netherlands, 2008, pp. 1-6, doi: 10.1109/AFGR.2008.4813472.
- ↑ L. K. Phadtare, R. S. Kushalnagar and N. D. Cahill, "Detecting hand-palm orientation and hand shapes for sign language gesture recognition using 3D images," 2012 Western New York Image Processing Workshop, Rochester, NY, USA, 2012, pp. 29-32, doi: 10.1109/WNYIPW.2012.6466652.
- ↑ K. Imagawa, Shan Lu and S. Igi, "Color-based hands tracking system for sign language recognition," Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998, pp. 462-467, doi: 10.1109/AFGR.1998.670991.
- ↑ Cheok, M.J., Omar, Z. & Jaward, M.H. A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. & Cyber. 10, 131–153 (2019). https://doi.org/10.1007/s13042-017-0705-5
- ↑ K. Imagawa, Shan Lu and S. Igi, "Color-based hands tracking system for sign language recognition," Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998, pp. 462-467, doi: 10.1109/AFGR.1998.670991.