PRE2018 4 Group8: Difference between revisions
(21 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
<div style="font-family: 'Arial'; font-size: 16px; line-height: 1.5; max-width: 1300px; word-wrap: break-word; color: #333; font-weight: 400; box-shadow: 0px 25px 35px -5px rgba(0,0,0,0.75); margin-left: auto; margin-right: auto; padding: 70px; background-color: rgb(255, 255, 253); padding-top: 25px; transform: rotate(0deg)"> | <div style="font-family: 'Arial'; font-size: 16px; line-height: 1.5; max-width: 1300px; word-wrap: break-word; color: #333; font-weight: 400; box-shadow: 0px 25px 35px -5px rgba(0,0,0,0.75); margin-left: auto; margin-right: auto; padding: 70px; background-color: rgb(255, 255, 253); padding-top: 25px; transform: rotate(0deg)"> | ||
= | <font size='7'>Emotion recognition</font> | ||
---- | |||
==Members== | ==Members== | ||
{| class="wikitable" style="border-style: solid; border-width: 1px;" cellpadding="4" | {| class="wikitable" style="border-style: solid; border-width: 1px;" cellpadding="4" | ||
Line 85: | Line 90: | ||
* The analyzed results of an interview, conducted on several groups of carers in an elderly home. | * The analyzed results of an interview, conducted on several groups of carers in an elderly home. | ||
===Steps to be taken=== | |||
First, a study of the state-of-the-art was conducted to get familiar with the different techniques of using a CNN for facial recognition. This information was crucial for deciding how our project will go beyond what is already researched. Now, a database with relevant photos will be constructed for training the CNN as well as one to test it. A CNN will be constructed, implemented and trained to recognize emotions using the database with photos. After training the CNN will be tested with the testing database, and if time allows it, it will be tested on real people. The usefulness of this CNN in elderly care robots will then be analyzed, as well as the ethical aspects surrounding our application. | |||
== Case-study == | == Case-study == | ||
Line 235: | Line 245: | ||
[[File:Socialrobot.png|thumb|400px|left|alt=Alt text|Figure 5: The SocialRobot]] [[File:Socialrobotactions.png|thumb|400px|right|alt=Alt text|Figure 6: The SocialRobot can carry out several different actions, like a) enter the room, b) approach a person, c) perform facial recognition, d) interact with a person, e) establish a social connection via its Skype interface, f) leave the room]] | [[File:Socialrobot.png|thumb|400px|left|alt=Alt text|Figure 5: The SocialRobot]] [[File:Socialrobotactions.png|thumb|400px|right|alt=Alt text|Figure 6: The SocialRobot can carry out several different actions, like a) enter the room, b) approach a person, c) perform facial recognition, d) interact with a person, e) establish a social connection via its Skype interface, f) leave the room]] | ||
=== SocialRobot project === | === SocialRobot project === | ||
Line 453: | Line 458: | ||
- MMI: | - MMI: | ||
- Faces DB: Dataset of our choice, it contains good annotated pictures from 35 subjects of different age | - Faces DB: Dataset of our choice, it contains good annotated pictures from 35 subjects of different age categories. | ||
- Belfast Database | - Belfast Database | ||
Line 476: | Line 481: | ||
For the training, validation and initial testing, a dataset called FacesDB <ref>FacesDB website, http://app.visgraf.impa.br/database/faces/</ref> is used. There will be two test sets, one of the people with estimated ages below 60, and one of the people with estimated ages above 60. After this, the goal is to perform a test on elderly people, where images of their facial expressions are processed in real-time. | For the training, validation and initial testing, a dataset called FacesDB <ref>FacesDB website, http://app.visgraf.impa.br/database/faces/</ref> is used. There will be two test sets, one of the people with estimated ages below 60, and one of the people with estimated ages above 60. After this, the goal is to perform a test on elderly people, where images of their facial expressions are processed in real-time. | ||
The first thing to be tested is whether a network trained only on images of people younger than the estimated age of 60, will still predict the right emotions for elderly people. | The first thing to be tested is whether a network trained only on images of people younger than the estimated age of 60, will still predict the right emotions for elderly people. The initial plan was to get this working, and then make it work in real-time. But it was decided to change priorities and make sure the real-time program works over the better classification of elderly people. | ||
===Dataset=== | ===Dataset=== | ||
Line 486: | Line 491: | ||
===Architecture=== | ===Architecture=== | ||
For a complicated problem like this, a simple CNN does not suffice. This is why, instead of building our own CNN, for now, a well-known image classification model is used. This CNN is called VGG16.<ref>Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.</ref> This network has been trained for the classification of 1000 different objects in images. Using transfer learning, we will be using this network for facial emotion classification. As can be seen in Figure 7, the CNN is build up of several layers. Some of these layers are convolutional layers that use filters to extract data from the input they receive, hence the name convolutional neural network. The data is transferred between these layers, from the input towards the output, using weighted connections. These weights scale the data when transferred to the next layer and are the part of the CNN which is actually optimized during training. Since the VGG16 network is already trained. There is no need to train all these weights again. The only thing needed to do is teaching the network to recognize the seven emotions that are needed as output, instead of over 1000 different objects. First, the output layer itself is changed to have the number of outputs needed for the problem at hand, which is seven. Next, the weights connecting to the output are trained. This is done by "freezing" all the weights, except for those connected to the output layer, so that only their values can be altered during training. Then the network, and thus essentially the final set of weights | For a complicated problem like this, a simple CNN does not suffice. This is why, instead of building our own CNN, for now, a well-known image classification model is used. This CNN is called VGG16.<ref>Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.</ref> This network has been trained for the classification of 1000 different objects in images. Using transfer learning, we will be using this network for facial emotion classification. As can be seen in Figure 7, the CNN is build up of several layers. Some of these layers are convolutional layers that use filters to extract data from the input they receive, hence the name convolutional neural network. The data is transferred between these layers, from the input towards the output, using weighted connections. These weights scale the data when transferred to the next layer and are the part of the CNN which is actually optimized during training. Since the VGG16 network is already trained. There is no need to train all these weights again. The only thing needed to do is teaching the network to recognize the seven emotions that are needed as output, instead of over 1000 different objects. First, the output layer itself is changed to have the number of outputs needed for the problem at hand, which is seven. Next, the weights connecting to the output are trained. This is done by "freezing" all the weights, except for those connected to the output layer, so that only their values can be altered during training. Then the network, and thus essentially the final set of weights is trained on the data. | ||
One problem that was encountered whilst working with the VGG16 network is that, since it has been previously trained, it expects a certain input image size. This size is 224x224 pixels. As mentioned before, the dataset which was used contains pictures of 640x480 pixels. This means the pictures have to be cropped to the correct size. This has to be done specifically around the face since this is the relevant area. | One problem that was encountered whilst working with the VGG16 network is that, since it has been previously trained, it expects a certain input image size. This size is 224x224 pixels. As mentioned before, the dataset which was used contains pictures of 640x480 pixels. This means the pictures have to be cropped to the correct size. This has to be done specifically around the face since this is the relevant area. | ||
Line 493: | Line 498: | ||
===Real time data cropping=== | ===Real time data cropping=== | ||
For a real-time application of our software, we use the build in webcam of a laptop. Using OpenCV 4.1 <ref>Pypi, "OpenCV", https://pypi.org/project/opencv-python/</ref> getting video footage is straight forward. But the size of the obtained video frames is not 224 by 224 pixels. Which gives rise to a similar problem as mentioned above for the database pictures. Two possible solutions are downsampling the video frames or cropping the images. The images from the database are rectangles in portrait. The images from the webcam are rectangles in landscape. Downsampling these to square format would lead to respectively wide and long faces which can’t be compared properly. So, we decided it was necessary to crop the faces. The point with cropping is that you don’t want to cut away (parts of) the face from the image. Using a pretrained face recognition CNN the location of the face in the image can be determined. After that, a square image around this face is cut out which can be downsampled if necessary to obtain a 224 by 224-pixel image. Based on the “Autocropper” application <ref>F. Leblanc, “Autocropper”, Github, https://github.com/leblancfg/autocrop</ref> we wrote a code which performs the tasks described above. | For a real-time application of our software, we use the build in webcam of a laptop. Using OpenCV 4.1 <ref>Pypi, "OpenCV", https://pypi.org/project/opencv-python/</ref> getting video footage is straight forward. But the size of the obtained video frames is not 224 by 224 pixels. Which gives rise to a similar problem as mentioned above for the database pictures. Two possible solutions are downsampling the video frames or cropping the images. The images from the database are rectangles in portrait. The images from the webcam are rectangles in the landscape. Downsampling these to the square format would lead to respectively wide and long faces which can’t be compared properly. So, we decided it was necessary to crop the faces. The point with cropping is that you don’t want to cut away (parts of) the face from the image. Using a pretrained face recognition CNN the location of the face in the image can be determined. After that, a square image around this face is cut out which can be downsampled if necessary to obtain a 224 by 224-pixel image. Based on the “Autocropper” application <ref>F. Leblanc, “Autocropper”, Github, https://github.com/leblancfg/autocrop</ref> we wrote a code which performs the tasks described above. | ||
===Results=== | ===Results=== | ||
[[File: test_old confusion matrix.jpg|thumb|387px|Figure 9: Confusion matrix of the test set which | [[File: test_old confusion matrix.jpg|thumb|387px|Figure 9: Confusion matrix of the test set which only contains elderly people. The confusion matrix has the true labels on the vertical axis, and the predicted labels on the horizontal axis.]] | ||
[[File: test confusion matrix.jpeg|thumb|387px|Figure 10: Confusion matrix of the test set which | [[File: test confusion matrix.jpeg|thumb|387px|Figure 10: Confusion matrix of the test set which does not contain any elderly people. The confusion matrix has the true labels on the vertical axis, and the predicted labels on the horizontal axis.]] | ||
The resulting program for real-time use can be seen in Figure 8. The webcam image is cropped to the appropriate size around the face and is then used as input for the model. The program returns the emotion it | The resulting program for real-time use can be seen in Figure 8. The webcam image is cropped to the appropriate size around the face and is then used as input for the model. The program returns the emotion it recognizes in the video frame and a percentage to indicate how certain it is about this. The output is shown on the bottom of the image. In the output, it can be seen that the certainty varies between 58% and 89%. It can also be seen that, even though the test subject is clearly smiling, it had predicted different emotions throughout the demo. Figure 8 depicts a demo, which was done in real time. During this real-time demo, the network only outputs a prediction when it recognizes an emotion with at least 50% certainty over 5 video frames. This is done to prevent errors due to the model or bad bad quality frames in the video footage. Besides testing the real-time version of the program two test sets of images were used to see what the predicted emotion was, compared to the true emotion. This can be seen in Figures 9 and 10. The test set seen in Figure 9 only contains images of people that were estimated to be "elderly". The test set that can be seen in Figure 10 only contains images of people who were not considered "elderly". | ||
As can be seen, when comparing Figures 9 and 10, the performance of the network on elderly people is worse than on the younger group, with only 11 out of 35 true positives in Figure 9 and 22 out of 35 true positives in Figure 10. Besides this, fear is a common false positive for both of the test sets. False positives that seem to occur more frequently in the test set only containing elderly are joy and sadness. | As can be seen, when comparing Figures 9 and 10, the performance of the network on elderly people is worse than on the younger group, with only 11 out of 35 true positives in Figure 9 and 22 out of 35 true positives in Figure 10. Besides this, fear is a common false positive for both of the test sets. False positives that seem to occur more frequently in the test set only containing elderly are joy and sadness. | ||
Line 506: | Line 511: | ||
===Conclusion=== | ===Conclusion=== | ||
The automatic cropping of the input image works very well, it is able to recognize faces in most cases where it should. Even when the user is not sitting right in front of the camera | The automatic cropping of the input image works very well, it is able to recognize faces in most cases where it should. Even when the user is not sitting right in front of the camera but is slightly turned away or her/his face is not in the center of the screen. | ||
The network seems to perform better on younger people when compared to elderly people. This could be due to the wrinkles that mostly appear in the faces of the elderly throwing off the network on its prediction. Another reason might be since the emotions were acted and not natural, that the elderly people had a more difficult time acting out these emotions. Some emotions of images were difficult to interpret even for humans, which means this could very well be a cause for the lower accuracy. Overall, this network seems to lack good training data. Not all emotions were acted out very convincingly or recognizable. Besides this, the noise of the background and body language (for instance leaning forwards or backwards) seemed to influence the prediction of the model in a significant way. This is most likely due to the images in FacesDB all having a black background. Since a black background is rarely the case in real-live application, this has to be improved before the program can actually be used. The fact that body language influences the prediction is not necessarily a problem since it is also a human tool of communication. This could, however, lead to many errors due to the positioning of the camera relative to the user. Suggesting that the user is leaning forward while this is not the case, leading to a wrong prediction. | The network seems to perform better on younger people when compared to elderly people. This could be due to the wrinkles that mostly appear in the faces of the elderly throwing off the network on its prediction. Another reason might be since the emotions were acted and not natural, that the elderly people had a more difficult time acting out these emotions. Some emotions of images were difficult to interpret even for humans, which means this could very well be a cause for the lower accuracy. Overall, this network seems to lack good training data. Not all emotions were acted out very convincingly or recognizable. Besides this, the noise of the background and body language (for instance leaning forwards or backwards) seemed to influence the prediction of the model in a significant way. This is most likely due to the images in FacesDB all having a black background. Since a black background is rarely the case in a real-live application, this has to be improved before the program can actually be used. The fact that body language influences the prediction is not necessarily a problem since it is also a human tool of communication. This could, however, lead to many errors due to the positioning of the camera relative to the user. Suggesting that the user is leaning forward while this is not the case, leading to a wrong prediction. | ||
The real-time program compensates for some of the errors made by the model. The fact that it only outputs a prediction when a certain emotion is predicted with certain | The real-time program compensates for some of the errors made by the model. The fact that it only outputs a prediction when a certain emotion is predicted with certain certainty over 5 video frames improves the accuracy of the model. However, it also decreases the number of outputs of the program. This is not necessarily a bad thing, but other methods could be implemented that do not result in a lower amount of outputs. | ||
===Further Research=== | ===Further Research=== | ||
Line 518: | Line 523: | ||
=== The set-up === | === The set-up === | ||
The Technology Acceptance Model (TAM) <ref> Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 319-340. </ref> states that two factors determine the elderly person's acceptance of | The Technology Acceptance Model (TAM) <ref> Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 319-340. </ref> states that two factors determine the elderly person's acceptance of robot technology: | ||
- perceived ease-of-use (PEOU) | - perceived ease-of-use (PEOU) | ||
- perceived usefulness (PU) | - perceived usefulness (PU) | ||
Line 681: | Line 686: | ||
== Conclusion == | == Conclusion == | ||
We have developed facial emotion recognition software using CNN’s. By employing transfer learning on the VGG16 neural network we wrote a program which recognizes seven emotions and can evaluate the certainty of its own prediction. The network was trained on the FacesDB dataset, which contains pictures of 35 participants among which five elderly. The program seems to work better on younger faces. During the test on elderly from the database only 11 out of 35 pictures were predicted correctly. But fortunately, after adding pictures of two group members to the database, the overall accuracy of the network increased significantly. Predictions on the faces of these group members could be performed easily as was shown during a real-time demo on Monday June 17. We conclude that our network does not have the accuracy needed after training on the data-set, but by training on the subject this accuracy might still be reached. | We have developed facial emotion recognition software using CNN’s. By employing transfer learning on the VGG16 neural network we wrote a program which recognizes seven emotions and can evaluate the certainty of its own prediction. The network was trained on the FacesDB dataset, which contains pictures of 35 participants among which five elderly. The program seems to work better on younger faces. During the test on the elderly from the database, only 11 out of 35 pictures were predicted correctly. But fortunately, after adding pictures of two group members to the database, the overall accuracy of the network increased significantly. Predictions on the faces of these group members could be performed easily as was shown during a real-time demo on Monday June 17. We conclude that our network does not have the accuracy needed after training on the data-set, but by training on the subject this accuracy might still be reached. | ||
We have looked at possible applications of our network and did research into the acceptance of a companion/care robot with our technology. A survey was distributed under a group of independently living elderly between 70 and 86 years old. Their opinions on the usefulness of the robot differ, but we could conclude that there is a market for our product. Two critical points came to sight after the survey; In general, the participants indicate that, despite our taken measures, their feeling of privacy would be affected when using the robot. Besides that, they would feel infantilized when the robot suggests actions based on their emotional state. These points will need further attention before this technology can be implemented. | We have looked at possible applications of our network and did research into the acceptance of a companion/care robot with our technology. A survey was distributed under a group of independently living elderly between 70 and 86 years old. Their opinions on the usefulness of the robot differ, but we could conclude that there is a market for our product. Two critical points came to sight after the survey; In general, the participants indicate that, despite our taken measures, their feeling of privacy would be affected when using the robot. Besides that, they would feel infantilized when the robot suggests actions based on their emotional state. These points will need further attention before this technology can be implemented. | ||
We also spoke with caregivers from elderly care homes and asked for their opinion on a companion/care robot with our technology. To our surprise, they did not seem to be concerned about being recorded themselves. They were positive about our idea to put a physical barrier in the form of a flap over the robot’s camera when it is not recording. Most of the | We also spoke with caregivers from elderly care homes and asked for their opinion on a companion/care robot with our technology. To our surprise, they did not seem to be concerned about being recorded themselves. They were positive about our idea to put a physical barrier in the form of a flap over the robot’s camera when it is not recording. Most of the caregivers were overall positive about the implementation of such a care robot in elderly care. But we also heard serious concerns from professionals who fear negative consequences when robots take over elderly care. | ||
In summary, there is a need for facial emotion recognition software which is specialized to be applied | In summary, there is a need for facial emotion recognition software which is specialized to be applied to elderly people in the context of care homes. This specialization should not only be sought in training the CNN to perform well on the elderly, but also in the protection of dignity and the feeling of privacy of the subject. We made a first step towards such software in this project. Our results are still far from perfect, but they show that with the current technology this goal is reachable. | ||
== Overall discussion == | == Overall discussion == | ||
Unfortunately we could not take the surveys at the elderly home as we planned, because the director did not want it. He explained that his company stands for qualitative good care for elderly. And that in his vision the use of care robots in elderly care is against this. Of course we understand that he doesn't want to give his patients the feeling that the care home is considering replacing nurses by robots, which could be a (shortsighted) derived conclusion when he decides to work with us. | Unfortunately, we could not take the surveys at the elderly home as we planned, because the director did not want it. He explained that his company stands for qualitative good care for the elderly. And that in his vision the use of care robots in elderly care is against this. Of course, we understand that he doesn't want to give his patients the feeling that the care home is considering replacing nurses by robots, which could be a (shortsighted) derived conclusion when he decides to work with us. | ||
== Planning == | == Planning == | ||
Deleted | |||
==Sources== | |||
<references /> | <references /> |
Latest revision as of 14:42, 18 February 2020
Emotion recognition
Members
Name | Student ID | Study | |
---|---|---|---|
Rik Hoekstra | 1262076 | r.hoekstra@student.tue.nl | Applied Mathematics |
Kilian Cozijnsen | 1004704 | k.d.t.cozijnsen@student.tue.nl | Biomedical Engineering |
Arthur Nijdam | 1000327 | c.e.nijdam@student.tue.nl | Biomedical Engineering |
Selina Janssen | 1233328 | s.a.j.janssen@student.tue.nl | Biomedical Engineering |
Ideas
Surgery robots
The DaVinci surgery system has become a serious competitor to conventional laparoscopic surgery techniques. This is because the machine has more degrees of freedom, thus allowing the surgeon to carry out movements that they were not able to carry out with other techniques. The DaVinci system is controlled by the surgeon itself, and the surgeon, therefore, has full control and responsibility for the result. However, as robots are becoming more developed, they might become more autonomous as well. But mistakes can still occur, albeit perhaps less frequently than with regular surgeons. In such cases, who is responsible? The robot manufacturer, or the surgeon? In this research project, the ethical implications of autonomous robot surgery could be addressed.
Elderly care robots
The aging population is rapidly increasing in most developed countries, while vacancies in elderly care often remain unfilled. Therefore, elderly care robots could be a solution, as they relieve the pressure of the carers of elderly people. They can also offer more specialized care and aide the person in their social development. However, the information recorded by the sensors and the video-images recorded by cameras should be protected well, as the privacy of the elderly should be ensured. In addition to that, robot care should not infantilize the elderly and respect their autonomy.
Facial emotion recognition
Facebook uses advanced Artificial Intelligence (AI) to recognize faces. This data can be used or misused in many ways. Totalitarian governments can use such techniques to control the masses, but care robots could use facial recognition to read the emotional expression of the person they are taking care of. In this research project, facial recognition for emotion regulation can be explored, as there are interesting technical and ethical implications that this technology might have on the field of robotic care.
Introduction
19.9% [1] of the elderly report that they experience feelings of loneliness. A potential cause of this is that they have often lost quite a large deal of their family and friends. It is an alarming figure, as loneliness has been linked to increased rates of depression and other degenerative diseases in the elderly. A solution to the problem of lonely elderly could be an assistive robot with a human-robot interactive aspect, which also has the possibility to communicate with primary or secondary carers. Such a robot would recognize the facial expression of the elderly person and from this deduct their needs. If the elderly person looks sad, the robot might suggest them to contact a family member or a friend via a Skype call. However, the technology for such interaction is a rather immature topic.
The EU-funded SocialRobot project [2] has developed a care robot which can move around, record audio or video, and most importantly, it can perform emotion recognition on audio recordings. However, the accuracy of the emotion classification based on audio alone is somewhere between 60 and 80 percent (depending on the specific emotion) [3], which is inadequate for the envisioned application. Research has shown that the combination of video images and recorded speech data is especially powerful and accurate in determining an elderly person's emotion. Therefore, this research project proposes a package for facial emotion recognition, as can be used for the SocialRobot project. [4]
If robots could accurately recognize human emotions, this would allow for more accurate robot-human interaction. There is simply an extra feedback mechanism for the robot to base the next actions on. Nevertheless, the use of facial recognition does raise moral and legal questions, especially concerning privacy and autonomy of the elderly person, which have been found to be important values in the field of social robots for elderly care [5]. Besides that, thorough research should be conducted on whether the primary users (the elderly people themselves) and the secondary users (formal and informal carers) perceive automated emotion recognition technology as useful in their work and their daily lives. This project investigates in what way Convolutional Neural Networks (CNNs) can/should be used for the purposes of emotion recognition in elderly care robots.
Objectives
- Construct a CNN that must be able to distinguish at least 1 emotion from other emotions. (e.g. happiness as opposed to 'neutral' emotions)
- Analyze the technical possibilities of a CNN applied on the elderly
- Analyze the perceived usefulness of automated emotion recognition technology for the envisioned primary and secondary users
- Analyze the ethical implications of using a CNN for emotion recognition in elderly care robots
Problem Statement
Research Question
The choice of our subject of study has gone to emotion recognition in the elderly. For this purpose, the following research question was defined:
In what way can Convolutional Neural Networks (CNNs) be used to perform emotion recognition on real-time video images of lonely elderly people?
Sub-subjects
Based on the research question, a set of sub-subjects was identified. The purpose of these sub-questions is to collectively solve the research question.
- Technical sub-questions:
What are the requirements for the dataset that will be used?
What is a suitable CNN architecture to analyze dynamic facial expressions?
- USE sub-questions:
What is a possible application of our CNN emotion recognition technology in the SocialRobot project?
In what way would users, society and enterprises benefit from emotion recognition software in social robots?
What are the consequences of false-positives versus false-negatives in emotion recognition?
What is the acceptance of our technology by the envisioned primary and secondary users?
Are there legal or moral issues that will impede the application of our technology?
Deliverables
The deliverables of this project include:
- Software, a neural network trained to recognize emotion from pictures of facial expressions. This software should be able to distinguish between the 7 basic facial expressions: Anger, disgust, joy, fear, surprise, sadness, neutral.
- A Wiki page, this will describe the entire process the group went through during the research, as well as a technical and USE evaluation of the software product.
- The analyzed results of a survey, geared towards the acceptance of robots in their home by elderly people.
- The analyzed results of an interview, conducted on several groups of carers in an elderly home.
Steps to be taken
First, a study of the state-of-the-art was conducted to get familiar with the different techniques of using a CNN for facial recognition. This information was crucial for deciding how our project will go beyond what is already researched. Now, a database with relevant photos will be constructed for training the CNN as well as one to test it. A CNN will be constructed, implemented and trained to recognize emotions using the database with photos. After training the CNN will be tested with the testing database, and if time allows it, it will be tested on real people. The usefulness of this CNN in elderly care robots will then be analyzed, as well as the ethical aspects surrounding our application.
Case-study
To give an insight into the potential applications of emotion recognition software in care robots for lonely elderly, the following case study has been constructed:
Bart always describes himself as "pretty active back in his days". But now that he's reached the age of 83 he is not that active anymore. He lives in an apartment complex for elderly people, with an artificial companion named John. John is a care-robot, that besides helping Bart in the household also functions as an interactive conversation partner. Every morning after greeting Bart, the robot gives a weather forecast. This is a trick it learned from the analysis of casual human-human conversations which almost always start with a chat about the weather. Today this approach seems to work fine, as after some time Bart reacts with: “Well, it promises to be a beautiful day, doesn’t it?” But if this robot was equipped with simple emotion recognition software it would have noticed that a sad expression appeared on Bart’s face after the weather forecast was mentioned. In fact, every time Bart hears the weather forecast he thinks about how he used to go out to enjoy the sun and the fact that he cannot do that anymore. With emotion recognition, the robot could avoid this subject in the future. The robot could even try to arrange with the nurses that Bart goes outside more often.
In this example, Bart would profit on the implementation of facial emotion recognition software in his care robot. At the same time, a value conflict might arise; On the one hand, the implementation of emotion recognition software could seriously improve the quality of the care delivered by the care robot. But on the other hand, we should seriously consider up to what extent these robots may replace the interaction with real humans. And when the robot decides to take action to get the nurses to let Bart go outside more often this might conflict with the right of privacy and autonomy. It might feel to Bart as if he is treated like a child when the robot calls his carers without Bart's consent.
State-of-the-Art technology
To promote the clarity of our literature study, the state-of-the-art technology section has been subdivided into three sections: Sources that provide information regarding the technical insight, sources that explain more about the implications our technology can have on USE stakeholders and sources that describe possible applications of our technology.
Technical insight
Neural networks can be used for facial recognition and emotion recognition. The approaches in literature can be classified based on the following elements:
- The database used for training of the data
- The feature selection method
- The neural network architecture
An example of a database is Cohn-Kanade extended (CK+, see fig. 1.) [6]. It has pictures from a diverse set of people, the participants were aged 18 to 50, 69% female, 81%, Euro-American, 13% Afro-American, and 6% other groups. The participants started out with a neutral face and were then instructed to take on several emotions, in which different muscular groups in the face (called action units or AUs by the researchers) were active. The database contains pictures with labels, classifying them into 7 different emotions: anger, disgust, fear, happiness, sadness, surprise and contempt.
Source 9 has used this database. To extract the features, they first cropped the image and then normalized the intensity (see fig. 2.). The researchers computed the local deviation of the normalized image with a structuring element that had a size of NxN.
Then as can be seen in fig. 3., the neural network architecture chosen by the researchers consisted of six convolution layers and two blocks of deep residual learning called a 'deep convolutional neural network' because of its size. In addition to that, after each convolution layer, there is a max pooling layer and there are 2 Fully Connected layers.
There are different ways of connecting neurons in machine learning. If we want to fully connect the neurons, and the input image is a small 200x200 pixel image, it already has 40,000 weights. The convolutional layers apply a convolution operation to the input, reducing the number of free parameters. The max-pooling layers combine the outputs of several neurons of the previous layer into a single neuron in the next layer, specifically, the one with the highest value is taken as an input value for the next layer, further reducing the number of parameters. The Fully Connected layers connect each neuron in one layer to each layer in the next layer. After training the neural network, each neuron has a weight.
The rest of the sources used a similar approach, and have been classified in the same manner in the table below:
Article number | Database used | Feature selection method | Neural Network architecture | Additional information |
---|---|---|---|---|
[7] | own database | It recognizes facial expressions in the following steps: division of the facial images in three regions of interest (ROI), the eyes, nose, and mouth. Then, feature extraction happens using a 2D Gabor filter and LBP. PCA is adopted to reduce the number of features. | Extreme Learning Machine classifier | This article entails a robotic system that not only recognizes human emotions but also generates its own facial expressions in cartoon symbols |
[8] | Karolinska Directed Emotional Face (KDEF) dataset | Their approach is a Deep Convolutional Neural Network (CNN) for feature extraction and a Support Vector Machine (SVM) for emotion classification. | Deep Convolutional Neural Network (CNN) | This approach reduces the number of layers required and it has a classification rate of 96.26% |
[9] | The dataset used was Extended Cohn-Kanade (CK+) and the Japanese Female Expression (JAFFE) Dataset | not mentioned | Deep Convolutional Neural Networks (DNNs) | The researchers aimed to identify 6 different facial emotion classes. |
[10] | own database | The human facial expression images were recorded and then segmented by using the skin color. Features were extracted using integral optic density (IOD) and edge detection. | SVM-based classification | In addition to the analysis of facial expressions, also speech signals were recorded. They aimed to classify 5 different emotions, which happened at an 87% accuracy (5% more than the images by themselves). |
[11] | own database | unknown | Bayesian facial recognition algorithm | This article is from 1999 and stands at the basis of machine learning, using a Bayesian matching algorithm to predict which faces belonging to the same person. |
[12] | unknown | This article uses a 3D candidate face model, that describes features of face movement, such as 'brow raiser' and they have selected the most important ones according to them | CNN | The joint probability describes the similarity between the image and the emotion described by the parameters of the Kalman filter of the emotional expression as described by the features, and it is maximized to find the emotion corresponding to the picture. This article is an advancement of the methods described in 8. The system is more effective than other Bayesian methods like Hidden Markov Models, and Principal Component Analysis. |
[13] | Cohn-Kanade database | unknown | Bayes optimal classifier | The tracking of the features was carried out with a Kalman Filter. The correct classification rate was almost 94%. |
[14] | own database | unknown | CNN | The method of moving average is utilized to make up for the drawbacks of still image-based approaches, which is efficient for smoothing the real-time FER results |
Sources about USE implications
Some elderly have problems recognizing emotions [15]. This is problematic, as primary care facilities for the elderly try to care using their emotions, e.g. to cheer the elderly person up by smiling. It would be very useful for the elderly to have a device similar to the Autismglass [16]. The Autismglass is, in fact, a Google Glass that was equipped with facial emotion recognition software (see fig. 4.). It is currently used to help children diagnosed with autism recognize the emotions of the people surrounding them.
Assistive social robots have a variety of effects or functions, including (i) increased health by decreased level of stress, (ii) more positive mood, (iii) decreased loneliness, (iv) increased communication activity with others, and (v) rethinking the past [17]. Most studies report positive effects. With regards to mood, companion robots are reported to increase positive mood, typically measured using evaluation of facial expressions of elderly people as well as questionnaires.
Possible applications
The research team of [18] has developed an android which has facial tracking, expression recognition and eye tracking for the purposes of treatment of children with high functioning autism. During the sessions with the robot, they can enact social scenarios, where the robot is the 'person' they are conversing with.
Source [19] has developed a doll which can recognize emotions and act accordingly using an Eyes of Things (EoT). This is an embedded computer vision platform which allows the user to develop an artificial vision and deep learning applications that analyze images locally. By using the EoT in the doll, they eliminate the need for the doll to connect to the cloud for emotion recognition and with that reduce latency and remove some ethical issues. It is tested on children.
Source [20] uses facial expression recognition for safety monitoring and health status of the old.
USE evaluation of the initial plan
User
Human-robot interaction
According to Kacperck[21], effective communication in elderly care is dependent on the nurse’s ability to listen and utilize non-verbal communication skills. Argyle[22] says there are 3 distinct forms of human non-verbal communication:
- Non-verbal communication as a replacement for language
- Non-verbal communication as a support and complement of verbal language, to emphasize a sentence or to clarify the meaning of a sentence
- Non-verbal communication of attitudes and emotions and manipulation of the immediate social situation, for example when sarcasm is used.
Facial expressions play an important role in these forms of non-verbal communication. However, robots do not have the natural ability to recognize emotions as well as humans do. This can lead to problems with elderly care robots. For example, a patient might try to consciously or subconsciously convey a message to the robot using facial expressions and the display of emotions, but the robot might not recognize this or recognizes it inaccurately. The elderly person may get frustrated because they have to put everything they feel and want into words, which may lead to them appreciate their care less. In the worst case, they may not accept the robot because it will feel too inhumane and cold.
The SocialRobot project already uses emotion recognition based on the tone of speech to deal with these problems. But this is not optimal, as the first form of human non-verbal communication gives rise to problems; If non-verbal communication is used as a replacement for speech, tone recognition won’t help to determine the current emotional state. Emotion recognition based on face expression would work better in this case. On the other hand, when speech is used, analyzing the meaning of the words used often already provides information about the emotional state of the speaker. Both facial emotion recognition and tone based emotion recognition should be able to complete the information in this case. The highest accuracy could be reached by combining image-based and tone based emotion recognition, but such complicated software might not be necessary if facial emotion recognition already gives satisfactory results.
Misusers
Every technology is prone to be misused as well. Facial emotion recognition technology can be very dangerous if it comes into the wrong hands and is used for immoral purposes. Totalitarian societies can use facial recognition technology to monitor continuously how their inhabitants are behaving and whether they are speaking the truth. Applications closer to elderly care might also be misused: The video data and the respective emotion classification of a vulnerable elderly person are stored in a robot. This information could be used in a way that goes against the free will of the elderly person, for example when it is shared with their carers against their will.
Society
combat loneliness
In most western societies, the aging population is ever increasing. This, in addition to the lack of healthcare personnel, poses a major problem to the future elderly care system. Robots could fill the vacancies and might even have the potential to outperform regular personnel: Using smart neural networks, they can anticipate the elderly person's wishes and provide 24/7 specialized care if required in the own home of the elderly person. The use of care robots in this way is supported by [23], which reports that care robots can not only be used in a functional way, but also to promote the autonomy of the elderly person by assisting them to live in their own home, and to provide psychological support. This is important, as the reduction of social isolation is detrimental to both the quality of life and the mental state of the elderly.
While emotion recognition can be used on various kinds of target groups (see state-of-the-art section), the high levels of loneliness amongst the elderly are the motivation for the choice of the elderly as our target group. However, elderly people are still a broad target group with a wide range of needs, in which the following categories can be defined:
- Elderly people with regular mental and functional capacities.
- Elderly people with affected mental capacities but with decent physical capabilities.
- Elderly people with affected mental and physical capacities.
All of the categories of elderly people might cope with loneliness, but category 2 and 3 are more likely to need a care robot. They are also a vulnerable group of people, as they might not have the mental capacity to consent to the robot's actions. In this respect, interpreting the person's social cues is vital for their treatment, as they might not be able to put their feelings into words. For this group of elderly, false negatives for unhappiness can especially have an impact. To deduce what impact it can have, it is important to look at the possible applications of this technology in elderly care robots.
As the elderly, especially those of categories 2 and 3, are vulnerable, their privacy should be protected. Information regarding their emotions can be used for their benefit, but can also be used against them, for example, to force psychological treatment if the patient does not score well enough on the 'happiness scale' as determined by the AI. Therefore, the system should be safe and secure. If possible, at least in the first stages secondary users can play a large role as well. Examples of such secondary users are formal caregivers and social contacts. The elderly person should be able to consent to the information regarding their emotions being shared with these secondary users. Therefore, only elderly people of category 1 were to be included in this study.
privacy and autonomy
“Amazon’s smart home assistant Alexa records private conversations” [24] Americans were in a paralytic privacy shock and these headlines were all over the news less than a year ago. But if you think about the functionality of this smart home assistant it shouldn’t come to you as a surprise that it’s recording every word you say. To be able to react on its wake word “Hey Alexa” it should be recording and analyzing everything you say. This is where the difference between “recording” and “listening” enters the discussion. When the device is waiting for its “wake word” it is “listening”: Recordings are deleted immediately after a quick local scan. Once the wake word is said, the device starts recording. However, contrary to Amazon's promises, these recordings were not processed offline nor were they deleted afterwards. After online processing of the recordings, which would not necessarily lead to privacy violation when properly encrypted, the recordings were saved at Amazon's servers to be used for further deep learning. The big problem was that Alexa would sometimes be triggered by other words than its “wake word” and start recording private conversations which were not meant to be in the cloud forever. These headlines affected the feeling of privacy of the general public. Amazon was even accused of using data from its assistant to make personalized advertisements for its clients.
Introducing a camera in peoples house is a big invasion of their privacy, and if privacy is not seriously considered during the development and implementation, similar problems as in the case of Amazon will arise.
'Image processing'
The question we would all ask when a camera is installed in our home is “What happens with the video recordings?”. In the case of Alexa, the recordings ended up on a server of Amazon. How will we prevent the video and audio recordings from the robot from getting published somewhere online? There is a simple solution for that; our software will run completely offline. In addition to that, the video recordings will be deleted immediately after processing. The processing itself is done by a neural network which will be trained in such a way that it only returns a string with the emotion. The neural network itself will gather more information which is needed for the processing, for example, the position of the face in the picture. This information cannot be obtained outside the network. This is important because the robot will probably have other functions for which it needs to be connected to the internet and these functionalities should not have access to other information than the currently detected emotion. In this way, we ensure that the only data which can be accessed outside the program will be the current emotional state.
Deduced information
A closed program does not guarantee privacy protection. In the case of Amazon, we see that Amazon is accused of using Alexa’s “normal output” to create personal advertisements. Having a list of the emotional state of a person during the complete day still is a large invasion of their privacy. For local feedback, where the robot notices whether its action leads to happiness, this does not become a problem. But if we take back the case study of Bart in the introduction, (where the robot suggests the homecare nurses to take Bart outside more often,) personal information about Bart is being shared with an external party. In this case, the robot should first ask for permission to the elderly person to share the data. This could decrease the effectivity of the robot. Bart might not agree to ask the nurses to take him outside more often, which could be because he thinks this is a burden for them and decides to not share the information. But this is the price we must pay for privacy, and it is Bart’s right to make the decision.
Feeling of safety
Would you like a camera recording you on the toilet? Even though this camera only recognizes your emotions and keeps that information to itself unless you agree to share it, the answer would probably be “no”. This is probably because you would not feel safe if you were recorded during your visit to the toilet. When a robot with a camera is introduced into an elderly person's house, their feeling of safety will be affected. To minimize this impact the robot will not be recording during the whole day. The camera will be integrated into the eye of the robot. When the robot is recording the eyes are open. When it is not recording the eyes will be closed and there will be a physical barrier over the camera. This should give the user a feeling of privacy when the eyes of the robot are closed. The robot will only record the elderly person when they interact with it. So as soon as they address, its eyes will open, and it will start monitoring their emotional state until the interaction stops. Besides that, the robot will have a checkup function. After consultation with the elderly person, this function can be enabled. This will make the robot wake every hour and check the emotional state of the elderly person. If the elderly person is sad, the robot can start a conversation to cheer them up. If this person is in pain the robot might suggest adjusting the medication to their primary carers.
Privacy of third parties
There are situations in which there is another person in the house besides the user of the care robot. There are three cases in which this person might be recorded. The first case is when this third party starts interacting with the robot. We choose not to use face identification of the user to protect their privacy. So, the robot will not know that it is recording a third party. But in our opinion, by starting a conversation with the robot the third party consents to being recorded. If the user interacts with the robot in presence of a third party the robot will focus on the user and emotion recognition will be performed on the most prominent face in the video which should be the user. The second case occurs when the checkup function is used. The robot might go for a checkup routine while a third party is in the house. If it sees the face of this party, it will perform emotion recognition and could start a conversation. This can only be prevented by an option to switch the checkup function off during the visit. The implementation of such a switch should be discussed when the decision is made to use this checkup function.
Enterprise
Enterprises who develop the robot
There already are enterprises which are developing robots that can determine facial expression to a certain extent, see for example the previous sub-section 'Possible Applications' in the state-of-the-art technology section. The robot of the SocialRobot project is another example. Except for the SocialRobot, all these applications are not developed especially for elderly people. However, these kinds of enterprises could be interested in software or a robot which is specialized in facial recognition of elderly people. Namely, they already develop the robots and the designs which are interesting for a robot to be usable in elderly care. So since there are not robots which can determine facial expression specific of elderly people at the moment and the demand of such robots could be increasing in the near future, these enterprises could be willing to invest in the software for facial recognition for elderly people. Also, the enterprises which are developing Care robots could be interested in software which can determine the facial expression of elderly people. Furthermore, the enterprises of companion robots, just like Paro, could implement such software in their robots too.
Enterprises who provide healthcare services
The market for the robot can be found in care homes or at homes of elderly people. Care homes could be interested in the development of a robot with facial expression recognition. This is because they can develop and build the robot to their own wishes when they are involved closely. However, care homes also could say no to this robot because they think that robots should not be present in the health care system or they simply do not think that the robot fits in their care home. Since the society of The Netherlands is growing grey, there is a lack of employees in care homes, especially in the near future. Therefore, care homes are searching for alternatives and this robot can be a solution to their problem.
Our robot proposal
SocialRobot project
The SocialRobot (see fig. 5.) was developed by a European project that had the aim to aid elderly people in living independently in their own house and to improve their quality of life, by helping them with maintaining social contacts. The robotic platform has two wheels and it is 125cm so that it looks approachable. It is equipped with, including but not limited to: a camera, infrared sensor, microphones, a programmable array of LEDs that can show facial expressions and a computer. Its battery pack is fit to operate continuously for 5 hours. But most importantly, the SocialRobot is able to analyze the emotional state of its user via audio recordings.
In the original SocialRobot project the robot identifies the elderly person using face recognition and then reads their emotion from the response that they give to its questions using voice recognition. The speech data is analyzed and from this, the emotion of the elderly person can be derived. The accuracy of this system was 82% in one of the papers published by the SocialRobot researchers (source 2). The idea is that the robot uses the response as input for the actions it takes afterwards, e.g. if the person is sad, they will encourage them to initiate a Skype call with their friends. If the person is bored, they will encourage them to play cards online with friends. We think that the quality of care provided by this robot could significantly increase when also facial emotion recognition is used.
What can our robot do?
The SocialRobot platform can execute various actions. Based on the actions, a minimum standard for the accuracy of facial emotion recognition can be specified, as the actions can have varying degrees of invasiveness.
- facial recognition: The robot can recognize the face of the elderly person and their carers.
- emotion recognition: Using a combination of facial data and speech data, the robot classifies the emotions of the elderly person.
- navigate to: The robot can go to a specific place or room in the elderly person's environment.
- approach person: The robot can use the same principle to detect a person and approach them safely.
- monitoring: Monitoring of the elderly person's environment.
- docking/undocking: The robot can autonomously drive towards a docking station to charge is batteries.
- SoCoNet call: The robot can retrieve and manage the information of the elderly in a so-called SoCoNet database.
- speech synthesis: The robot can use predefined text-to-speech recordings to verbally interact with the user.
- information display: The robot can display information on a tablet interface.
- social connection: The robot has a Skype interface to establish communication between the elderly person and their carers/friends.
Emotion Classification
The current package of emotion recognition in the SocialRobot platform contains the following emotions: Anger, Fear, Happiness, Disgust, Boredom, Sadness, Neutral. This is because the researchers of the project have used openEAR software which employs the Berlin Speech Emotion Database (EMO-DB) [25]. However, not all of these emotions are equally important for our application. In addition to that, we do not recognize boredom as an emotion, as this is rather a state. The most important emotions for the treatment of elderly persons are probably anger, fear, and sadness. For the purposes of our research project, boredom and disgust could be omitted, as the inclusion of unnecessary features may complicate the classification process.
Actions
The actions described under "What can our robot do?" can be combined for the robot to fulfill several more complex tasks. We have chosen 3 tasks that are especially important for elderly care of lonely elderly, given the privacy requirements described in the previous sub-section. These three actions are associated with the emotion 'sadness', but of course, many other actions can be envisioned. For each of the other emotions, we have decided to illustrate possible actions, but in less detail than for the emotion 'sadness'. This approach is similar to that of source 3, where the difference is that that project investigates the application of speech emotion recognition specifically, and we mainly investigate the influence of facial emotion recognition.
anger
It is possible that the elderly person does not accept the SocialRobot in their house at times, and that they do not want to have the SocialRobot near them while they are angry. In the case that a high anger score is reached, the SocialRobot could ask the elderly person whether they want the SocialRobot to move back towards their docking station. If the docking station is inside the home of the elderly person, the SocialRobot could additionally alert a carer to ensure that no harm is inflicted on both the robot and the elderly person themselves.
fear
Many elderly people have mobility issues. If the elderly person under the surveillance of the SocialRobot falls, they will probably score high under fear. The SocialRobot can then carefully move towards the location of the elderly person and ask them whether they want the robot to request help. If the answer is yes, the robot can reassure the elderly person that help is on the way and alert their carer.
happiness
Suppose that on the birthday of the elderly person, their grandchildren visit them. They will likely score high on the happiness score, as they are happy to see their family. In such a situation, the robot does not really have to carry out an action, as it is not immediately necessary. However, the robot could further facilitate the social connection between the elderly person and their grandchildren by letting them play a game together on the user interface.
neutral
Just like with happiness, if the facial expression of the elderly person is neutral there is not a direct point in carrying out an action. However, it is important for the robot to keep monitoring the face of the elderly person. It is possible that the robot basically does not have enough information to find a facial expression. If this is the case, the robot could ask the elderly person a question, for example, whether they are happy. The answer and the facial expression could give the robot more information about the state of the person.
The three actions for sadness have been illustrated with a chronological overview of a potential situation where the person is sad. Based on the score for sadness, the SocialRobot chooses one of the three actions.
Get Carer
This situation is most useful in elderly care homes. As elderly care homes are becoming increasingly understaffed, a care robot could help out by entertaining the elderly person or by determining when they need additional help from a human carer. The latter is a complicated task, as the autonomy of the elderly person is at stake, and each elderly person will have a different way of communicating that they need help. It is possible to include a button in the information display of the robot which says 'get carer'. In this case, the elderly person can decide for themselves if they need help. This could be helpful if the elderly person is not able to open a can of beans or does not know how to control the central heating system. This option is well suited for our target group of elderly people that have full cognitive capabilities, for simple actions that have a low risk of violating the dignity of the elderly person.
However, if the elderly person feels depressed and lonely, it can be harder for them to admit that they need help. The robot could use emotion recognition to determine a day-by-day score (1-10) for each emotion. These scores can be monitored over time in the SocialRobot's SoCoNet database. The elderly person should give consent for the robot to share this information with a carer. A less confronting approach might be to show the elderly person the decrease in happiness scores on the display and to ask them whether they would want to talk to a professional about how they are feeling. Then, a 'yes' and 'no' button might appear. If the elderly person presses 'yes', the robot can get a carer and inform them that the elderly person needs someone to talk to, without reporting the happiness scores or going into detail what is going on.
SocialRobot | Elderly person | Carer |
---|---|---|
Moves to the elderly person's location | ||
Ask elderly person if they want to request help | Responds to SocialRobot "yes" or presses "yes" button on the display | |
SocialRobot turns camera toward elderly person's face | Has a sad face | |
Recognises elderly person's face | ||
Classifies elderly person's 'sadness' emotion with a score above 7/10 | Receives a notification of the elderly person's emotion score and that they need help | |
Goes towards Elderly person |
Call a friend
Lonely elderly often do have some friends or family, but they might live somewhere else and not visit them often. The elderly person should not get in social isolation, as this is an important contributor to loneliness. As mentioned before, loneliness is not only detrimental to their quality of life, but also to their physical health. Therefore, it is important for the care robot to provide a solution to this problem. The SocialRobot platform could monitor the elderly person's emotions over time, and if 'sad' scores higher than usual, the robot could encourage the elderly person to initiate a Skype call with one of their relatives/friends via the function social connection. The elderly person should still press a button on the display to actually start the call, to ensure autonomy.
SocialRobot | Elderly person | Friend |
---|---|---|
Moves to the elderly person's location | ||
Initiates a conversation about the weather | Responds to SocialRobot "I am sad that I cannot go outside" | |
Recognises elderly person's answer | ||
SocialRobot turns camera toward elderly person's face | Has a sad face | |
Recognises elderly person's face | ||
Classifies elderly person's 'sadness' emotion with a score below 7/10, but above 4/10 | ||
Asks elderly person if they want to call a friend | Responds "yes" or pushes "yes" button on the display | |
Initiate a video call with the elderly person's favourite contact | Friend receives call notification |
Have a conversation
Sometimes, the lonely person just needs someone to talk to but does not have available relatives or friends. In such cases, the robot might step in and have a basic conversation with the elderly person concerning the weather or one of his/her hobbies. If humans have a conversation, they tend to get closer to each other, be it for the sole purpose of understanding each other better. The SocialRobot can approach the elderly person by recognizing their face and moving towards them.
The initiation of the conversation could be done by the elderly person themselves. For this purpose, it might be a good idea to give the SocialRobot a name, like 'Fred'. Then, the elderly person can call out Fred, so that the robot will undock itself and approach them. If the initiation of the conversation comes from the robot, it is important for the robot to know when the elderly person is in a mood to be approached. If the person is 'sad' or 'angry' when the robot is in the midst of approaching the person, then the robot should stop and ask the elderly person 'do you want to talk to me?'. If the elderly person says no, the robot will retreat.
SocialRobot | Elderly person | |
---|---|---|
Moves to the elderly person's location | ||
Initiates a conversation about the weather | Responds to SocialRobot "I am sad that I cannot go outside" | |
Recognises elderly person's answer | ||
SocialRobot turns camera toward elderly person's face | Has a sad face | |
Recognises elderly person's face | ||
Classifies elderly person's 'sadness' emotion with a score below 4/10 | ||
Asks elderly person to continues the conversation | Responds "yes" or pushes "yes" button on the display | |
Asks a question about the person's favourite hobby |
Consequence analysis
The actions described in the previous section can have several consequences when the emotion classification results are corrupted by false positives or false negatives.
While most current literature on emotion classification is geared towards improving the reliability of the classification, they not set a minimum classification accuracy per emotion. It, therefore, seems that this is a relatively unexplored field of technological ethics. Instead of setting a minimum value, we have thus decided to solely regard the consequences of false positives and false negatives for each emotion. For the purposes of solving our research question, false positives and negatives concerning the recognition of 'sadness' were found to be particularly important.
anger
FN
If the system has a false negative for anger, this might have immense consequences. Even though the robot is not able to physically touch the person, the robot might still unwillingly enter the elderly person's private space. If the elderly person is angry, this might result in harm to the robot or the person themselves. The cost of a false negative for anger is tremendous, as it has both a significant material cost (in potential damage to the SocialRobot system) and a physical cost (the elderly person might harm themselves), impeding the safety of the elderly person.
FP
A false positive might result in unnecessary workload on behalf of the carer. The material cost of this is much lower than that of a false negative for anger.
fear
FN
If fear goes undetected in the situation described in the previous section, this might have serious consequences.
FP
A false positive results in the unnecessary workload of the carer.
happiness
FN
A false negative for happiness results in unnecessary actions taken out by the robot and potentially the carer as well. It might also reduce the elderly person's trust in the system, as the SocialRobot quite obviously suggests an action that does not fit with the positive mood of the person.
FP
If a false positive occurs for happiness, the consequences depend on the true emotion. However, as all other emotions are more negative than happiness, it probably means that the SocialRobot is negligent, not carrying out an action that it should have carried out, had it recognized the correct emotion.
neutral
FN
Upon falsely detecting another expression than neutral, the system will likely carry out an unnecessary action. The consequences are similar to those described at the false negatives for happiness.
FP
In the case of a false positive for the neutral emotion, the elderly person's true emotion might be more positive (happy) or more negative. For the latter, the consequences are similar to those described at the false positives for happiness. For the former, there will virtually be no consequences, as the robot treats happiness and neutral emotions almost the same.
sadness
FN
For a false negative, the consequences are immense. The main point of this research project is to identify sadness for the target group of lonely elderly people. If sadness is not identified, the robot does not have a negative effect, or might even have an adverse effect.
FP
When a false positive occurs for sadness, either the robot starts talking to the person without an incentive, or they unnecessarily suggest calling a friend, or they alert (in)formal carers. The latter can result in unnecessary workload. In general, it might be infantilizing to the elderly if their emotions are misjudged but still acted on, leading to a loss of trust in the technology.
Suggestions for improving the reliability of emotion classification
https://sail.usc.edu/publications/files/Busso_2004.pdf
This article reports that multimodal emotion classification is significantly better than emotion classification based on one mode (audio or images). Besides that, they record several images to base a classification on. If happiness is displayed in more than 90 percent of the frames, they are classified as happy, but when sadness is displayed even more than 50 percent of the frames, they are classified as sad.
https://link.springer.com/article/10.1007/s12369-015-0297-8
This article stresses that it is important to improve the reliability of the classification and that this can be done by using more realistic databases and by fusing the results of multiple databases.
https://link.springer.com/article/10.1007/s12369-012-0144-0
This article notes that the user will only trust the robot if it is accurate in reading their emotions and basing the actions upon this info.
Data sets
The neural network needs to be trained on a dataset with annotated pictures. The pictures in this dataset should contain the faces of people and should be labeled with their emotion. To make sure our neural network works on elderly people, there should be at least some pictures of the elderly in the dataset. Creating our own dataset with pictures of good quality will be a lot of work. Besides that gathering this data from elderly participants gives rise to a lot of privacy issues and privacy legislation. So we looked for an existing dataset that we could use. Underneath you’ll find a list of databases we considered with some pros and cons. This list is based on the one on the Wikipedia page “Facial expression databases” [26]. Most of these datasets are gathered by universities and available for further research, but access must be requested.
- Cohn-Kanade: We got access to this dataset. Unfortunately, we could not handle the annotation, a lot of labels seemed to be missing.
- RAVDESS: only video's, as we want to analyze pictures it is better to have annotated pictures than frames from annotated video's.
- JAFFE: only Japanese women. Not relevant to our research.
- MMI:
- Faces DB: Dataset of our choice, it contains good annotated pictures from 35 subjects of different age categories.
- Belfast Database
- MUG In the database participated 35 women and 51 men all of the Caucasian origins between 20 and 35 years of age. Men are with or without beards. The subjects are not wearing glasses except for 7 subjects in the second part of the database. There are no occlusions except for a few hairs falling on the face. The images of 52 subjects are available to authorized internet users. The data that can be accessed amounts to 38GB.
- RaFD Request for access has been sent (Rik). Access denied.
- FERG[27] avatars with annotated emotions.
- AffectNet[28]: Huge database (122GB) contains 1 million pictures collected from the internet of which 400 000 are manually annotated. This dataset was too big for us to work with. - IMPA-FACE3D 36 subjects 5 elderly open access
- FEI only neutral-smile university employees. This dataset does not contain elderly.
- Aff-Wild downloaded (Kilian)
Facial Emotion Recognition Network
There is also a technical aspect of our research. This involves creating a convolutional neural network (CNN) that can differentiate between seven emotions from facial expressions only. The seven emotions used are mentioned in the emotion classification section of our robot proposal.
This CNN needs to be trained before it can be used. This training is done by providing the CNN with the data you want to analyze (in this case images of facial expressions) with the correct label, the output you would want the network to give you for this input. The network iteratively optimizes its weights by using, for instance, the sum of the squared errors between the predicted value and the ground truth (given labels). During this training, the validation data is also used. The accuracy and loss of this separate validation set are shown during training. This separate dataset is used to see whether the network is overfitting on the training data. In that case, the network would just be learning to recognize the specific images in the training set, instead of learning features that can be generalized to other data. This is prevented by looking at how the network performs on the validation set. After training, to see how good the CNN performs, another separate dataset called the test set is used.
For the training, validation and initial testing, a dataset called FacesDB [29] is used. There will be two test sets, one of the people with estimated ages below 60, and one of the people with estimated ages above 60. After this, the goal is to perform a test on elderly people, where images of their facial expressions are processed in real-time. The first thing to be tested is whether a network trained only on images of people younger than the estimated age of 60, will still predict the right emotions for elderly people. The initial plan was to get this working, and then make it work in real-time. But it was decided to change priorities and make sure the real-time program works over the better classification of elderly people.
Dataset
The Dataset that is used for training, validation and initial testing is called FacesDB. FacesDB was chosen because it contained fully labeled images of different ages. Moreover, this dataset also contains elderly participants, which is relevant since this is our target audience. The dataset consists of 38 participants, each acting out the seven different emotions. In this dataset, five of the 38 participants are considered to be “elderly”. These five will be excluded from all sets and put into a separate test set. There will also be a validation set, consisting of 3 participants, and a “younger” test set, also consisting of 3 people.
The images in this dataset are 640x480 pixels, with the faces of the participants in the middle of the image. The background of all images is completely black.
Architecture
For a complicated problem like this, a simple CNN does not suffice. This is why, instead of building our own CNN, for now, a well-known image classification model is used. This CNN is called VGG16.[30] This network has been trained for the classification of 1000 different objects in images. Using transfer learning, we will be using this network for facial emotion classification. As can be seen in Figure 7, the CNN is build up of several layers. Some of these layers are convolutional layers that use filters to extract data from the input they receive, hence the name convolutional neural network. The data is transferred between these layers, from the input towards the output, using weighted connections. These weights scale the data when transferred to the next layer and are the part of the CNN which is actually optimized during training. Since the VGG16 network is already trained. There is no need to train all these weights again. The only thing needed to do is teaching the network to recognize the seven emotions that are needed as output, instead of over 1000 different objects. First, the output layer itself is changed to have the number of outputs needed for the problem at hand, which is seven. Next, the weights connecting to the output are trained. This is done by "freezing" all the weights, except for those connected to the output layer, so that only their values can be altered during training. Then the network, and thus essentially the final set of weights is trained on the data.
One problem that was encountered whilst working with the VGG16 network is that, since it has been previously trained, it expects a certain input image size. This size is 224x224 pixels. As mentioned before, the dataset which was used contains pictures of 640x480 pixels. This means the pictures have to be cropped to the correct size. This has to be done specifically around the face since this is the relevant area.
Real time data cropping
For a real-time application of our software, we use the build in webcam of a laptop. Using OpenCV 4.1 [31] getting video footage is straight forward. But the size of the obtained video frames is not 224 by 224 pixels. Which gives rise to a similar problem as mentioned above for the database pictures. Two possible solutions are downsampling the video frames or cropping the images. The images from the database are rectangles in portrait. The images from the webcam are rectangles in the landscape. Downsampling these to the square format would lead to respectively wide and long faces which can’t be compared properly. So, we decided it was necessary to crop the faces. The point with cropping is that you don’t want to cut away (parts of) the face from the image. Using a pretrained face recognition CNN the location of the face in the image can be determined. After that, a square image around this face is cut out which can be downsampled if necessary to obtain a 224 by 224-pixel image. Based on the “Autocropper” application [32] we wrote a code which performs the tasks described above.
Results
The resulting program for real-time use can be seen in Figure 8. The webcam image is cropped to the appropriate size around the face and is then used as input for the model. The program returns the emotion it recognizes in the video frame and a percentage to indicate how certain it is about this. The output is shown on the bottom of the image. In the output, it can be seen that the certainty varies between 58% and 89%. It can also be seen that, even though the test subject is clearly smiling, it had predicted different emotions throughout the demo. Figure 8 depicts a demo, which was done in real time. During this real-time demo, the network only outputs a prediction when it recognizes an emotion with at least 50% certainty over 5 video frames. This is done to prevent errors due to the model or bad bad quality frames in the video footage. Besides testing the real-time version of the program two test sets of images were used to see what the predicted emotion was, compared to the true emotion. This can be seen in Figures 9 and 10. The test set seen in Figure 9 only contains images of people that were estimated to be "elderly". The test set that can be seen in Figure 10 only contains images of people who were not considered "elderly".
As can be seen, when comparing Figures 9 and 10, the performance of the network on elderly people is worse than on the younger group, with only 11 out of 35 true positives in Figure 9 and 22 out of 35 true positives in Figure 10. Besides this, fear is a common false positive for both of the test sets. False positives that seem to occur more frequently in the test set only containing elderly are joy and sadness.
Due to a relatively small dataset and rapid overfitting of the network, a version of the network containing pictures of several members of this group was also used in the training set. It was expected, and noticed, that this increased the performance of the network on the faces of the members which were included in the network. Besides this, it also positively impacted the predictions made on other faces. This impact was, however, smaller than that on the faces present in the training set.
Conclusion
The automatic cropping of the input image works very well, it is able to recognize faces in most cases where it should. Even when the user is not sitting right in front of the camera but is slightly turned away or her/his face is not in the center of the screen.
The network seems to perform better on younger people when compared to elderly people. This could be due to the wrinkles that mostly appear in the faces of the elderly throwing off the network on its prediction. Another reason might be since the emotions were acted and not natural, that the elderly people had a more difficult time acting out these emotions. Some emotions of images were difficult to interpret even for humans, which means this could very well be a cause for the lower accuracy. Overall, this network seems to lack good training data. Not all emotions were acted out very convincingly or recognizable. Besides this, the noise of the background and body language (for instance leaning forwards or backwards) seemed to influence the prediction of the model in a significant way. This is most likely due to the images in FacesDB all having a black background. Since a black background is rarely the case in a real-live application, this has to be improved before the program can actually be used. The fact that body language influences the prediction is not necessarily a problem since it is also a human tool of communication. This could, however, lead to many errors due to the positioning of the camera relative to the user. Suggesting that the user is leaning forward while this is not the case, leading to a wrong prediction.
The real-time program compensates for some of the errors made by the model. The fact that it only outputs a prediction when a certain emotion is predicted with certain certainty over 5 video frames improves the accuracy of the model. However, it also decreases the number of outputs of the program. This is not necessarily a bad thing, but other methods could be implemented that do not result in a lower amount of outputs.
Further Research
For further research, several things are suggested. The main point would be to increase the amount of training data used. Besides the amount of input data used, the emotion quality, background, and lighting could be relevant. This is especially the case if the size of the dataset is small. When this training set becomes larger, the influence of single pictures containing abnormal background, lighting or expression of emotions becomes smaller. Besides this, to compensate for the fact that the background and body language could influence the prediction, data manipulation could be used to segment the face of the user more precisely. It could also be used to enhance features, such as edges, to make it easier for the network to recognize emotions. Further research would also be needed to confirm whether a convolutional neural network trained to recognize facial emotions on "younger" people, in fact, does perform worse on the elderly. This was suggested by this research but could be due to a few matters mentioned before.
Survey
The set-up
The Technology Acceptance Model (TAM) [33] states that two factors determine the elderly person's acceptance of robot technology: - perceived ease-of-use (PEOU) - perceived usefulness (PU)
The TAM model has successfully been used before on 'telepresence' robot applications for elderly. Telepresence is when a robot monitors the elderly person using smart sensors, such as cameras and speech recorders in the case of the SocialRobot project. Just like in [34], the TAM model has been used to formulate a questionnaire, where the elderly person can indicate to what extent they agree with a statement on a 5-point Likert scale. This scale goes from "I totally agree", which is given the maximum score of 5, to "I totally disagree", which is given the minimum score of 1. As most elderly people in the Netherlands have Dutch as their mother tongue, the questionnaire was translated to Dutch. First follows an introduction to the questionnaire.
Introduction to the questionnaire (Dutch)
Wij doen onderzoek naar de sociale acceptatie van een zorgrobot die gebruikt maakt van gezichtsemotie herkenning. Hierover willen wij graag uw mening. Deze zal anoniem worden verwerkt (zonder naam en persoonlijke gegevens). De resultaten worden gebruikt in een user-casestudie. Hieronder volgt een introductie van de robot technologie daarna volgt de vragenlijst.
Als basis beschouwen we de robot uit het SocialRobot project. Het SocialRobot project is een Europees project waarin een zorgrobot voor in de oudere zorg is ontwikkeld. Deze robot kan zich zelfstandig voortbewegen door het huis. Hij kan interactie aangaan met de gebruiker zowel door spraak als met een touchscreen. Hij heeft daarnaast toegang tot internet en kan dus ook contact opnemen met derde partijen zoals de thuiszorg of familie. Deze robot willen wij uitbreiden met een camera die gebruikt wordt voor gezichtsemotie herkenning. Wanneer de robot interactie aangaat met de gebruiker zal de camera geactiveerd worden om de gezichtsemotie van de gebruiker te lezen. Zo kunnen gesprekken bijvoorbeeld beter op de emotionele staat gebruiker worden afgestemd. Om de privacy van de gebruiker te waarborgen zullen de camerabeelden direct na analyse worden verwijderd. De verkregen gegevens over de emotie van de gebruiker zullen alleen met anderen worden gedeeld als de robot hier toestemming voor heeft gehad. Als de robot bijvoorbeeld opmerkt dat de gebruiker vaak pijn heeft, zal hij de gebruiker voorstellen deze informatie met de arts te delen zodat die betere pijnstillers kan voorschrijven. De robot zou uitgerust kunnen worden met een klep die over de camera schuift als deze niet opneemt. Dit om het gevoel van privacy van de gebruiker te bevorderen. Tot slot is er een optionele “check up” functie. Als deze functie wordt ingeschakeld zal de robot een aantal keer per dag de gebruiker opzoeken en zijn emotionele staat bepalen. De robot kan dan bijvoorbeeld, als hij ziet dat iemand in een melancholische bui is, een gesprek aangaan om de aandacht af te leiden.
Er volgt nu een aantal stellingen. U kunt uw mening over deze stellingen aangeven door aan te kruizen of u het helemaal eens / eens / neutraal / oneens / helemaal oneens bent. Alvast hartelijk dank voor uw bijdrage.
Survey Questions
Perceived Usefulness
- Having the SocialRobot platform in my house would enable me to accomplish tasks more quickly and be more productive. (Als ik Fred (de naam van onze SocialRobot) in mijn huis zou hebben dan zou ik sneller taken uit kunnen voeren, zoals huishoudelijke taken, en productiever zijn)
- I would find the SocialRobot platform useful. (Ik zou Fred nuttig vinden)
Perceived Ease of Use
- Learning to operate the SocialRobot with my voice or with a tablet would be easy for me. (Ik zou Fred makkelijk kunnen aansturen met een tablet of met mijn stem)
- My interaction with the SocialRobot would be more clear and understandable when it is able to read my emotions. (Ik beter kunnen communiceren met Fred als hij mijn emoties kon zien)
For the sake of brevity and applicability, some of the questions from the TAM model were omitted. This was because the nuances between these questions and the questions listed above were unclear, or unclear in translation or because the questions were more relevant for secondary users than for elderly people.
Study-specific questions
To investigate the effect of this robot on the feeling of privacy and loneliness, the following questions were added to the questionnaire:
- I would have a conversation with the SocialRobot about the weather. (Ik zou het met Fred over het weer hebben)
- I would tell the SocialRobot how I'm feeling. (Ik zou Fred vertellen hoe ik me voel)
- Automatic emotion recognition by the robot would help me interact with it. (Het zou me helpen in de interactie met de robot als hij automatisch mijn emoties kan herkennen)
- I would appreciate the SocialRobot to suggest actions (like calling a family member or going outside to buy groceries) when I feel lonely. (Ik zou het waarderen als Fred voorstellen doet (zoals het opbellen van een vriend of naar buiten gaan om boodschappen te doen) als Fred merkt dat ik eenzaam of verdrietig ben)
- I would feel comfortable with video and audio recordings by the SocialRobot, as long as I know they are deleted afterwards and not sent to my carers without my permission. (Ik zou het niet erg vinden als er audio- en videoopnamen van mij gemaakt worden door Fred, als ik zeker weet dat deze opnamen daarna verwijderd worden en niet naar mijn hulpverleners/mantelzorgers gestuurd kunnen worden zonder mijn toestemming)
Questionnaire results and discussion
Only the most important results will be addressed in this section. For the full results, see: https://docs.google.com/spreadsheets/d/1KCbSxQH2MRkKDhnIVomq7SVeq_vJqv-1geSBdY7-W2M/edit#gid=491855232
Due to the fact that permission was denied to execute the questionnaire amongst elderly care home inhabitants (see "overall discussion"), the survey was carried out in a group (n=9) of elderly people who live independently. The inclusion criteria were 'being older than 70' and 'having a sufficient mental state'. The first criterion was included for the reason that, as robot technology advances quickly, this is going to be the first age group that will experience the use of care robots in their future elderly homes. The second criterion was used for the sake of consent; Only elderly people with sufficient mental capabilities could willingly consent to fill in the survey. Both criteria were upheld by the fact that the survey was sent to respondents who were acquaintances of the researchers of group 8. The average age of the respondents was 79, with the youngest being 70 and the oldest being 86.
Because the number of respondents was quite low, the results are not assumed to be statistically relevant. However, they still provide an exploratory outlook on the acceptance of our robot technology in the group that will likely encounter it first. The insights derived from this survey could reshape the design of the emotion recognition software.
perceived usefulness
Our respondents were divided on their opinion about the perceived usefulness of the robot in general. Concerning their productivity with the robot in use, 4 respondents checked 'neutral', 2 checked 'agree', 2 checked 'disagree', and 1 checked 'fully disagree'. The exact same division, albeit answered by different respondents, was portrayed for the usefulness of the robot.
perceived ease of use
The respondents had a more homogeneous opinion on the perceived ease of use of the robot in general. This entails the ease with which the elderly person can give commands to the robot by voice commands or by commands executed via a tablet interface. Half of the respondents agreed that they could communicate better with the robot if it is equipped with emotion recognition software, while the other half disagreed or even strongly disagreed with this statement.
study-specific questions
No respondent would talk to the robot about the weather, but 4 respondents would tell the robot how they are feeling. This is a fairly remarkable result, as feelings are more personal than the weather. Most (5) respondents would not like the robot to suggest actions to them based on their mood. Most importantly, most of them would not like the robot to take audio and video recordings, even if they are deleted afterwards and these recordings cannot be shared with their carers without their permission.
Discussion
The respondents' attitude towards the usefulness of elderly care robots, in general, differed a lot. It might be that this is because our respondents are not dependent on 24/7 care yet. This seems to comply with the participants of the study of [35]: 28.2% of the participants in their 'healthy people of older age' group had an interest in a robotic companion robot. They mentioned that the robot could be interesting to lonely and isolated elderly, but not for themselves.
Our respondents were positive that they would be able to easily use the 'SocialRobot'. However, our survey was digital, which already indicates that the elderly people who filled it out are quite digitally competent. For the envisioned target group, it is possible that it would not be that easy for them to instruct the robot. This concern was also expressed by researchers from the SocialRobot project [36]: "The elderly are all convinced that learning new skills for using technical equipment is a serious problem". In such cases, the emotion recognition software might help out a lot in the communication, although our respondents were not that enthusiastic about this feature.
The respondents' dislike of the robot suggesting actions based on their mood probably arises from their sense of dignity. This conclusion can be taken as Pino et al suggest a very similar result: "Concerns about dignity were mainly pointed out by healthy older aged participants (47.9%)". They further mention that this is because of possible infantilization of elderly people.
Our proposal to delete the video and audio recordings, and not share this information with carers without permission from the elderly person, was not enough to ensure the feeling of privacy. This is a major issue, that should be investigated more thoroughly.
Suggestions for further research
Unfortunately, we were not able to conduct this survey on a larger and more diverse sample size (including also elderly people that do not live independently). This impeded a statistical analysis of the results, which we would suggest for further research, as well as a subdivision of elderly into different categories similar to that of Pino et al:
- Mild Cognitive Impairment (MCI): These participants are over the age of 70, have a clinical diagnosis of MCI and do not live independently. They do not have any other medical condition or psychiatric disorder that precludes their participation in the survey.
- Healthy Older Adults-independently living (HOA-I): These participants are over the age of 70 and have no diagnosed cognitive impairments. They live independently.
- Healthy Older Adults-assisted living (HOA-II): These participants are over the age of 70, have no diagnosed cognitive impairments, but do live in an elderly home for other reasons.
It would be interesting to analyze the results of the survey based on the category that the elderly people belong to.
Besides the target group of our survey, the focus of the questionnaire could also be expanded. As it is important to keep the survey concise to prevent respondent fatigue, we suggest diving into one of the mentioned topics at a time.
Firstly, we suggest diving deeper into the privacy issues associated with facial emotion recognition technology, as these do not seem to be resolved with our precautions. Besides that, it is important to re-evaluate the PEOU and the PU among the other respondent categories. Lastly, the topic of autonomy deserves more attention, as this topic has mostly been omitted from our survey. To what extent would the elderly be fine with the robot making decisions for them, or should the robot always ask for their permission (even in potentially life-threatening situations)?
Interview
Values like privacy, social interaction, relevance, and improvement of care should be reflected to design a social robot for elderly care. These values could have an influence on the process by which the robots are introduced to the market. We explored the opinion of secondary users consisting of a nurse of individual health care (verzorgend IG), who assists the elderly through the day by caring for them, doing activities with them and giving them food or drinks. Also, the care for the people at a department of a care home is done by the nurse of individual health care. Furthermore, the opinion of a student and assistant in the care of elderly people, who helps the elderly people in their daily lifestyle, an assistant nurse, who amuses the elderly people during the evening, a staff member of education at a care home, who trains stagiaires and visits departments in a care home to stay up to date about the changes in the care of the elderly people, were asked. Moreover, a nurse of home care was interviewed. The social robot can function as, for example, a companion for the elderly who live alone in their home that other technology cannot. However, the robot must function as the user wishes. This interview is done to acquire the opinion of other stakeholders, and to gain information about how they think of the robot and its functions and if they have additional advice.
Privacy (and safety)
They all agreed that putting a flap before the eyes of the robot is desirable when it is not recording because it is clear for everyone if the robot is recording or not then. Also, it is useful to put an on and off switch on the robot. Therefore, the robot can be switched off when it is not needed. The district nurse said that it is not necessary for employees to know if the robot is recording or not because most of the times they are not a long time at the home of the elderly. However, for the elderly themselves, it is really important to be certain if the robot is recording or not.
Social interaction
“There are people who like the idea of having a robot in their home that can assist them, but human to human contact is very important and a robot is not a human so cannot replace this.”
“The robot can stimulate the user to be more positive in life.”
The robot can help the elderly by analyzing their emotions and react at a proper way on this, but the danger is that the elderly do not have social contacts anymore and only interacts with the robot. Elderly people who live at home without someone else can feel less lonely when the robot is present and interacts with them. Moreover, it is important that the user can interact with the robot, because the user can understand the robot better probably.
Relevance
It is a good idea to introduce such a robot in a care home or at homes of elderly people. This is because elderly people get the idea that we understand them, and they feel less lonely. And when the robot gives the right information, complaints can be solved much faster. Also, the robot can contribute to the life of the elderly people when the robot is adjusted to its user. Moreover, the elderly can be happy to have a discussion partner. Also, it is not so quiet at home when the robot is present. It can be a good alternative for a dog because the robot can function as a companion for the elderly and the elderly does not have to walk the robot just like a dog. The robot can mainly function to increase the well-being of elderly people. In conclusion, it can improve the lives of the elderly when they are not afraid of the robot.
“Such a robot really can be a good addition in the care for elderly people in a care home or at home.”
“Hmm, it can be a good robot. However, it is difficult to say if it is a good idea to implement such a robot because does the robot work as planned?”
“It is a horrible thought to use robots to care for people. Although, it is better than nothing, but rather do not use robots.”
The checkups which the robot can do can be really useful. Namely, the robot can overtake tasks and relieve the nurses. Additionally, it is better that the robot does checkups after a certain amount of time, for example, after one hour. This is because elderly people can have the feeling that the robot is chasing them and giving them negative interest when the robot records 24/7. Moreover, when the daily rhythm of the user is observed, the robot knows when checkups throughout the day are necessary. On the other hand, the student stated that in a care home, checkups are not necessary, because it is enough when only the nurses do the checkups. Maybe, in a home-situation, recording 24/7 could be useful. The nurses should be the ones that are in control.
“The robot cannot drop out if it observed that the elderly person is not doing well. It must take action.”
Also, the workload will decrease, because the robot can function as a mnemonic device for the elderly people. Namely, the robot can be programmed in such a way that it can overtake the planning of its user and, therefore, help the user through the day. For example, the robot can remind the elderly to take in their medication. However, the workload does not have to decrease because the nurses also can asses the emotion of the person, the student said. Although, the robot is advantageous to call, for example, the general practitioner. Due to this, the nurses do not have to do this by their selves and, therefore, do not lose extra time.
Improvement of care
The care for people can improve especially when you look at lonely elderly people. Often, they are home bounded an do not have many social contacts. The robot can function, therefore, as a companion for the lonely person. Also, by knowing the emotional state of the elderly people, the care can be better adjusted whereby the care will improve. Moreover, the care for elderly people can increase because the robot can take over the function of pointing out how the elderly person is doing. Therefore, time can be spent on other things instead of going to clients only to check how they are doing.
“When little tasks are taken over by the robot, the nurses can focus on the care for the people and can give more patient-oriented care.”
Furthermore, people with dementia are unpredictable because they change of stemming. Such a robot can easier react to this and help the person. The robot cannot judge and, therefore, maybe feels more empathy for the person at such moments. Elderly people can get more patient-oriented care when the robot determines the emotion of them. Also, for people with dementia, such a robot can be necessary during the night. Namely, these people can be restless or awake during the night and by knowing their emotion, analyzed by the robot, a nurse can interfere or not. On the contrary, the staff member of education had another opinion. Namely, people with dementia, they live back in the years when they were young. Therefore, it is possible that they are afraid of the robot. Care for people will not improve when robots are used then.
“It is possible that the robot can assist but it is necessary that the robot can act adequately.”
Appearance
The opinions differ a little bit from each other on this part.
“The robot looks a little bit boring, so design it more colorful or cuddlesome.”
“The appearance of the robot is quite funny. It is good looking for its function.”
“The appearance of the robot is very neutral, not distracting, and fits its function.”
“If the robot has the appearance of an animal, it is already better than when it looks more like a human. That is what it makes scary. If the robot of this research looks like the robot of the picture, it looks quite friendly because it resembles a snowman.”
“The robot looks funny but design it not too white because that makes the robot looks too sterile. Also, the robot does not scare off because it has no arms. It looks a little bit like a penguin.”
Conclusion
Four of the five people who were interviewed were positive about the robot and its applications. One was negative about the robot and indicated not being a fan of a robot in care systems. However, she admitted being open to the development of robots and said that there could be elderly people who really want such a robot in their life. Conclusion of this interview is, said by one of the interviewed people:
“If it is possible to put the robot on and off according to peoples wishes, the robot really can be received positively. Additionally, the next generation, that is eligible for this technology, is more known to this kind of technology and, therefore, will be more willing to enter such a robot in their life.”
Discussion and further research
All the interviewed people work for the same organization. Therefore, for further research people who work at other care homes and care organizations can be interviewed. Moreover, the opinion of more people who work in care homes or as a caregiver can be asked to get a better overview of all the people who can meet the robot. Also, in this research, the focusing group were lonely elderly people. During the interview, also people with dementia and possible applications for the robot for the care of these people were mentioned. Accordingly, in further research, this group can be investigated. In this research, the focus was not on people with dementia because it is difficult to get consent from them and the project was too short to arrange something.
Conclusion
We have developed facial emotion recognition software using CNN’s. By employing transfer learning on the VGG16 neural network we wrote a program which recognizes seven emotions and can evaluate the certainty of its own prediction. The network was trained on the FacesDB dataset, which contains pictures of 35 participants among which five elderly. The program seems to work better on younger faces. During the test on the elderly from the database, only 11 out of 35 pictures were predicted correctly. But fortunately, after adding pictures of two group members to the database, the overall accuracy of the network increased significantly. Predictions on the faces of these group members could be performed easily as was shown during a real-time demo on Monday June 17. We conclude that our network does not have the accuracy needed after training on the data-set, but by training on the subject this accuracy might still be reached.
We have looked at possible applications of our network and did research into the acceptance of a companion/care robot with our technology. A survey was distributed under a group of independently living elderly between 70 and 86 years old. Their opinions on the usefulness of the robot differ, but we could conclude that there is a market for our product. Two critical points came to sight after the survey; In general, the participants indicate that, despite our taken measures, their feeling of privacy would be affected when using the robot. Besides that, they would feel infantilized when the robot suggests actions based on their emotional state. These points will need further attention before this technology can be implemented.
We also spoke with caregivers from elderly care homes and asked for their opinion on a companion/care robot with our technology. To our surprise, they did not seem to be concerned about being recorded themselves. They were positive about our idea to put a physical barrier in the form of a flap over the robot’s camera when it is not recording. Most of the caregivers were overall positive about the implementation of such a care robot in elderly care. But we also heard serious concerns from professionals who fear negative consequences when robots take over elderly care.
In summary, there is a need for facial emotion recognition software which is specialized to be applied to elderly people in the context of care homes. This specialization should not only be sought in training the CNN to perform well on the elderly, but also in the protection of dignity and the feeling of privacy of the subject. We made a first step towards such software in this project. Our results are still far from perfect, but they show that with the current technology this goal is reachable.
Overall discussion
Unfortunately, we could not take the surveys at the elderly home as we planned, because the director did not want it. He explained that his company stands for qualitative good care for the elderly. And that in his vision the use of care robots in elderly care is against this. Of course, we understand that he doesn't want to give his patients the feeling that the care home is considering replacing nurses by robots, which could be a (shortsighted) derived conclusion when he decides to work with us.
Planning
Deleted
Sources
- ↑ Holwerda, T. J., Deeg, D. J., Beekman, A. T., van Tilburg, T. G., Stek, M. L., Jonker, C., & Schoevers, R. A. (2014). Feelings of loneliness, but not social isolation, predict dementia onset: results from the Amsterdam Study of the Elderly (AMSTEL). J Neurol Neurosurg Psychiatry, 85(2), 135-142.
- ↑ Portugal, D., Santos, L., Alvito, P., Dias, J., Samaras, G., & Christodoulou, E. (2015, December). SocialRobot: An interactive mobile robot for elderly home care. In 2015 IEEE/SICE International Symposium on System Integration (SII) (pp. 811-816). IEEE.
- ↑ da Silva Costa, L. M. (2013). Integration of a Communication System for Social Behavior Analysis in the SocialRobot Project.
- ↑ Han, M. J., Hsu, J. H., Song, K. T., & Chang, F. Y. (2008). A New Information Fusion Method for Bimodal Robotic Emotion Recognition. JCP, 3(7), 39-47.
- ↑ Draper, H. & Sorell, T. (2017). Ethical values and social care robots for older people: an international qualitative study. Ethics and Information Technology, 19(1), 49-68.
- ↑ Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended Cohn-Kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops (pp. 94-101). IEEE.
- ↑ Liu, Z., Wu, M., Cao, W., Chen, L., Xu, J., Zhang, R., ... & Mao, J. (2017). A facial expression emotion recognition based human-robot interaction system.
- ↑ Ruiz-Garcia, A., Elshaw, M., Altahhan, A., & Palade, V. (2018). A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots. Neural Computing and Applications, 29(7), 359-373.
- ↑ Jain, D.K., Shamsolmoali, P., Sehdev, P. (2019). Extended deep neural network for facial emotion recognition. Pattern Recognition Letters, 120, 69-74.
- ↑ Han, M. J., Hsu, J. H., Song, K. T., & Chang, F. Y. (2008). A New Information Fusion Method for Bimodal Robotic Emotion Recognition. JCP, 3(7), 39-47.
- ↑ Moghaddam, B., Jebara, T., & Pentland, A. (2000). Bayesian face recognition. Pattern Recognition, 33(11), 1771-1782.
- ↑ Fan, P., Gonzalez, I., Enescu, V., Sahli, H., & Jiang, D. (2011, October). Kalman filter-based facial emotional expression recognition. In International Conference on Affective Computing and Intelligent Interaction (pp. 497-506). Springer, Berlin, Heidelberg.
- ↑ Maghami, M., Zoroofi, R. A., Araabi, B. N., Shiva, M., & Vahedi, E. (2007, November). Kalman filter tracking for facial expression recognition using noticeable feature selection. In 2007 International Conference on Intelligent and Advanced Systems (pp. 587-590). IEEE.
- ↑ Wang, F., Chen, H., Kong, L., & Sheng, W. (2018, August). Real-time Facial Expression Recognition on Robot for Healthcare. In 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR) (pp. 402-406). IEEE.
- ↑ Sullivan, S., & Ruffman, T. (2004). Emotion recognition deficits in the elderly. International Journal of Neuroscience, 114(3), 403-432.
- ↑ Sahin, N. T., Keshav, N. U., Salisbury, J. P., & Vahabzadeh, A. (2018). Second Version of Google Glass as a Wearable Socio-Affective Aid: Positive School Desirability, High Usability, and Theoretical Framework in a Sample of Children with Autism. JMIR human factors, 5(1), e1.
- ↑ Broekens, J., Heerink, M., & Rosendal, H. (2009). Assistive social robots in elderly care: a review. Gerontechnology, 8(2), 94-103.
- ↑ Pioggia, G., Igliozzi, R., Ferro, M., Ahluwalia, A., Muratori, F., & De Rossi, D. (2005). An android for enhancing social skills and emotion recognition in people with autism. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4), 507-515.
- ↑ Espinosa-Aranda, J., Vallez, N., Rico-Saavedra, J., Parra-Patino, J., Bueno, G., Sorci, M., ... & Deniz, O. (2018). Smart Doll: Emotion Recognition Using Embedded Deep Learning. Symmetry, 10(9), 387.
- ↑ Wang, F., Chen, H., Kong, L., & Sheng, W. (2018, August). Real-time Facial Expression Recognition on Robot for Healthcare. In 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR) (pp. 402-406). IEEE.
- ↑ Lynn Kacperck. (2014, December). Non-verbal communication: the importance of listening. Retrieved May 5, 2019, from https://www.magonlinelibrary.com/doi/abs/10.12968/bjon.1997.6.5.275
- ↑ Argyle, M. (1972). Non-verbal communication in human social interaction. In R. A. Hinde, Non-verbal communication. Oxford, England: Cambridge U. Press. Retrieved May 5, 2019, from https://psycnet.apa.org/record/1973-24485-010
- ↑ Broekens, J., Heerink, M., & Rosendal, H. (2009). Assistive social robots in elderly care: a review. Gerontechnology, 8(2), 94-103.
- ↑ Fowler, G. A. (2019, May 6). Alexa has been eavesdropping on you this whole time. The Washington Post. Retrieved from https://www.washingtonpost.com/technology/2019/05/06/alexa-has-been-eavesdropping-you-this-whole-time/?utm_term=.7fc144aca478
- ↑ Eyben, F., Wöllmer, M., & Schuller, B. (2009, September). OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit. In 2009 3rd international conference on affective computing and intelligent interaction and workshops (pp. 1-6). IEEE.
- ↑ Wikipedia. Facial expression databases. 16 March 2019 https://en.wikipedia.org/wiki/Facial_expression_databases
- ↑ D. Aneja, A. Colburn, G. Faigin, L. Shapiro, and B. Mones. Modeling stylized character expressions via deep learning. In Proceedings of the 13th Asian Conference on Computer Vision. Springer, 2016.
- ↑ A. Mollahosseini; B. Hasani; M. H. Mahoor, "AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild," in IEEE Transactions on Affective Computing, 2017.
- ↑ FacesDB website, http://app.visgraf.impa.br/database/faces/
- ↑ Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
- ↑ Pypi, "OpenCV", https://pypi.org/project/opencv-python/
- ↑ F. Leblanc, “Autocropper”, Github, https://github.com/leblancfg/autocrop
- ↑ Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 319-340.
- ↑ Koceski, S., & Koceska, N. (2016). Evaluation of an assistive telepresence robot for elderly healthcare. Journal of medical systems, 40(5), 121.
- ↑ Pino, M., Boulay, M., Jouen, F., & Rigaud, A. S. (2015). “Are we ready for robots that care for us?” Attitudes and opinions of older adults toward socially assistive robots. Frontiers in aging neuroscience, 7, 141.
- ↑ Partner, R., & Christophorou, C. Specification of User Needs and Requirements.