PRE2019 4 Group6: Difference between revisions

From Control Systems Technology Group
Jump to navigation Jump to search
Line 505: Line 505:
Once the presence of every emotion has been detected and whether the individual is feeling positive or negative and with what arousal, the robot can use this information. How to use this is very important as history has shown that if the robot does not meet the needs and desires of the potential users the robot will be unused.
Once the presence of every emotion has been detected and whether the individual is feeling positive or negative and with what arousal, the robot can use this information. How to use this is very important as history has shown that if the robot does not meet the needs and desires of the potential users the robot will be unused.
Therefore based on the survey, there was chosen to have three different trajectories for which the chatbot will have responses. These three can be seen below in the picture:
Therefore based on the survey, there was chosen to have three different trajectories for which the chatbot will have responses. These three can be seen below in the picture:
[[File:Trajects_Emotions.jpg|400px|Image: 800 pixels|center|thumb|Figure 19: Trajectories in Valence and arousal diagram]]
[[File:Trajects_Emotions.jpg|400px|Image: 800 pixels|center|thumb|Figure 20: Trajectories in Valence and arousal diagram]]
Trajectory 1 is where the user has a quite arousal and low valence. How to deal with these emotions is bases on the survey results and literature<ref name="PSy">Glancy, G., & Saini, M. A. (2005). An evidenced-based review of psychological treatments of anger and aggression. Brief Treatment & Crisis Intervention, 5(2).</ref>. From this, the most important factor for dealing with these strong emotions was to listen to the person and ask questions about why he or she is feeling angry. Therefore the main thing the robot does is ask if something is bothering the person and if it can help. If the user confirms it will immediately get into a special environment where a psychiatric chatbot was set up. This chatbot is based on the ELIZA bot with additional functionalities added from the ALICE bot. The ELIZA bot is quite old but does exactly what the users want: listen and ask questions. It works by rephrasing the input of the user in such a way that it evokes a continuous conversation.<ref name="Elize">Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (Jan. 1966), 36–45. DOI:https://doi.org/10.1145/365153.365168</ref>. An example of user interaction can be seen in the figure below:
Trajectory 1 is where the user has a quite arousal and low valence. How to deal with these emotions is bases on the survey results and literature<ref name="PSy">Glancy, G., & Saini, M. A. (2005). An evidenced-based review of psychological treatments of anger and aggression. Brief Treatment & Crisis Intervention, 5(2).</ref>. From this, the most important factor for dealing with these strong emotions was to listen to the person and ask questions about why he or she is feeling angry. Therefore the main thing the robot does is ask if something is bothering the person and if it can help. If the user confirms it will immediately get into a special environment where a psychiatric chatbot was set up. This chatbot is based on the ELIZA bot with additional functionalities added from the ALICE bot. The ELIZA bot is quite old but does exactly what the users want: listen and ask questions. It works by rephrasing the input of the user in such a way that it evokes a continuous conversation.<ref name="Elize">Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (Jan. 1966), 36–45. DOI:https://doi.org/10.1145/365153.365168</ref>. An example of user interaction can be seen in the figure below:
[[File:Example 3 bot.png|400px|Image: 800 pixels|center|thumb|Figure 20: Example interaction Trajectory 1]]
[[File:Example 3 bot.png|400px|Image: 800 pixels|center|thumb|Figure 21: Example interaction Trajectory 1]]


Trajectory 2 is when the person is sad with low to neutral arousal. From the survey and literature, it became clear that during this state there are a lot more things the robot should be able to do. In the survey, people suggested that the robot should also ask questions, but also It should be able to give some physical exercises, tell a story and a few other things. These have all been implemented in this trajectory. An interaction in there could look the following:
Trajectory 2 is when the person is sad with low to neutral arousal. From the survey and literature, it became clear that during this state there are a lot more things the robot should be able to do. In the survey, people suggested that the robot should also ask questions, but also It should be able to give some physical exercises, tell a story and a few other things. These have all been implemented in this trajectory. An interaction in there could look the following:
[[File:Example 4 bot.png|400px|Image: 800 pixels|center|thumb|Figure 21: Example interaction Trajectory 2(1/2)]]
[[File:Example 4 bot.png|400px|Image: 800 pixels|center|thumb|Figure 22: Example interaction Trajectory 2(1/2)]]
[[File:Example 5 bot.png|400px|Image: 800 pixels|center|thumb|Figure 22:  Example interaction Trajectory 2(2/2)]]
[[File:Example 5 bot.png|400px|Image: 800 pixels|center|thumb|Figure 23:  Example interaction Trajectory 2(2/2)]]
Trajectory 3 is when the person has high arousal with a slight negative valence. This is in between the first two trajectories in terms of what the robot does. Like trajectory 1 it is important that talking and listening to the user is very important. However, due to the higher valence, talking and listening is not always wanted. Therefore the robot should first ask if the person wants to talk about there feeling or if they would want to do something else. If the person wants to do something else all the options the robot can do will be listed just like trajectory 2. An interaction could look like the following:
Trajectory 3 is when the person has high arousal with a slight negative valence. This is in between the first two trajectories in terms of what the robot does. Like trajectory 1 it is important that talking and listening to the user is very important. However, due to the higher valence, talking and listening is not always wanted. Therefore the robot should first ask if the person wants to talk about there feeling or if they would want to do something else. If the person wants to do something else all the options the robot can do will be listed just like trajectory 2. An interaction could look like the following:
[[File:Example 6 bot.png|400px|Image: 800 pixels|center|thumb|Figure 23: Example interaction Trajectory 3(1/2)]]
[[File:Example 6 bot.png|400px|Image: 800 pixels|center|thumb|Figure 24: Example interaction Trajectory 3(1/2)]]
[[File:Example 7 bot.png|400px|Image: 800 pixels|center|thumb|Figure 24: Example interaction Trajectory 3(2/2)]]
[[File:Example 7 bot.png|400px|Image: 800 pixels|center|thumb|Figure 25: Example interaction Trajectory 3(2/2)]]


==Chatbot implementation==
==Chatbot implementation==

Revision as of 18:34, 25 June 2020


R.E.D.C.A.(Robotic Emotion Detection Chatbot Assistant)

Group Members

Name Student ID Study Email
Coen Aarts 0963485 Computer Science c.p.a.aarts@student.tue.nl
Max van IJsseldijk 1325930 Mechanical Engineering m.j.c.b.v.ijsseldijk@student.tue.nl
Rick Mannien 1014475 Electrical Engineering r.mannien@student.tue.nl
Venislav Varbanov 1284401 Computer Science v.varbanov@student.tue.nl

Introduction

For the past few decades, the percentage of elderly people in the population has been rapidly increasing. One of the contributing factors is the advancements in the medical field, causing people to live longer lives. The percentage of the world population aged above 60 years old is expected to roughly double in the coming 30 years [1]. This growth will bring many challenges that need to be overcome. Simple solutions are already being implemented this day, e.g. people will generally work to an older age before retiring.

Problem statement

Currently, this increase in elderly in care-homes already affects the time caregivers have available for each patient, resulting in most of their time being spent on vital tasks like administering medicines and less of a focus on social tasks such as having a conversation. Ultimately, patients in care-homes will spend more hours of the day alone. These statements combined with the fact that the prevalence of depressive symptoms increase with age [2] and the feeling of loneliness being one of the three major factors that lead to depression, suicide and suicide attempts [3] [4], results in a rather grim perspective for the elderly in care homes. A simple solution could be ensuring that there is a sufficient amount of caregivers to combat staff shortage. Unfortunately, the level of shortage has been increasing for the past years. It is therefore vital that a solution to aid caregivers in any way, shape, or form is of importance for the benefits of the elderly. An ideal solution would be to augment care-home staff with tireless robots, capable of doing anything a human caregiver can. However, since robots are not likely to be that advanced any time soon one could develop a less sophisticated robot that can take over some of the workload of the caregiver rather than all. New advances in Artificial Intelligence (AI) research might assist in finding a solution here, as it can be used to perform complex human-like algorithms. This enables the creation of a robot capable of monitoring the mental state of a human being and process this information to possibly improve their mental state if it drops beyond nominal levels. There are different kinds of interactions the robot could perform in order to achieve its goal. For instance, the robot could have a simple conversation with the person or routing a call to relatives or friends. For more complex interactions and conversations, the robot requires to understand more about the emotional state of the person. If the robot is able to predict the emotional state of the person reliably, the quality of the interactions may improve greatly. Not being able to make good predictions regarding their mental state will likely cause more frustration or other negative feelings for the person which is highly undesirable.

Objective

The proposal for this project is to use emotion detection in real-time in order to predict how the person is feeling. Using this information the AI should be able to find appropriate solutions to help the person. These readings will not be perfect as it is a fairly hard task to predict the emotional state of a human being, but promising results can be seen when neural networks are used to perform this task [5].

Using a neural network in order to predict the emotional state of a human is not a new invention, however using this information in a larger system that also implements other components such as a chatbot is something that is less researched. The research being done during this project, therefore, might result in new findings that can be used in the future. However, the results found could also endorse similar research done with similar results.

The robot should be able to perform the following tasks:

  • Human emotional detection software
  • Chatbot function
  • Feedback system that tracks the effect of the robot

As for its name, it will from now on be referred to as R.E.D.C.A.(Robotic Emotion Detection Chatbot Assistant).

State of the Art

This is most likely not the first research regarding AI technology for emotion recognition and chatbot interactions. Before we proceed with our own findings and development, it is encouraged to reflect back to older papers and development when it comes to the combination of components we wish to implement. Below one may find the state of the art of Facial Recognition and Chatbot interactions and a summary for each state of the artwork. Additionally, their limitations will be assessed in order for us to comprehend their design and work around them.

Previous Work

Facial Detection

% TODO


The facial detection with emotion recognition software has already been done before. One method that could be employed for facial detection is done by the following block diagram:

Figure 1: Block diagram of facial recognition algorithm[6]

This setup was proposed for a robot to interact with a person based on the persons' emotion. However, the verbal feedback was not implemented.

Emotion Recognition

At the time of writing this, there is no one agreed-upon way of comparing emotion recognition algorithms. Therefore it is not known which emotion recognition method is the most accurate one in practice. However, researchers keep track of how the algorithms based on their papers perform on certain datasets of images of faces[7], the emotion in which is known, i.e. the so-called emotion recognition datasets. For each dataset, the accuracy score (percentage of true positive guesses) of the (best) methods tried on the dataset are shown with links to the papers and in some cases - code. So although we can't talk of a state-of-the-art algorithm in general, we can refer to a state-of-the-art method for a given emotion recognition dataset (the one with the highest proven accuracy for that dataset).

Chatbot

Siri

Siri is a virtual assistant made by Apple. It houses a lot of diverse interaction options using its advanced machine learning framework. Siri offers options to set alarms, have a simple conversation, schedule meetings or check the weather.

Mitsuku

Mitsuku is the current five-time winner of the Loebner Prize Turing Test. This means that this chatbot is the most humanlike bot in the world. It uses Artificial Intelligence Markup language which is specifically designed for making chatbots. It can be chatted with for free on their website.[8]

Care Robots

Many varieties of care robots have been integrated in our society. They vary from low level automata to high level designs that keep track of patient biometrics. For our problem, we focus on the more social care robots that interact with users on a social level. A few relevant robots are PARO, ZORA and ROBEAR.


PARO[9] is a therapeutic robot resembling a seal. Based on animal therapy, its purpose is to create a calming effect and to elicit emotional reactions from patients in hospitals and nursing homes. It is s able to interact with patients via tactile sensors and contains software to look at the user and remember their faces and express emotions.
Figure 3: PARO being hugged
ZORA[10], which stands for Zorg, Ouderen, Revalidatie en Animation is a small humanoid robot with a friendly haptic design. It's primarily dispatched to Dutch nursing homes to engage the elderly in a more active lifestyle. It is capable to dance, sing and play games with the user and its design has shown to be effective as positive association for the user. Additionally, ZORA is also used for autistic children to interact with, as it contains predictive movement behavior and a consistent intonation.
Figure 4: ZORA in action
ROBEAR[11] is humanoid care robot portraying a bear. The design and function is to aid patients with physical constraints with movement, such as walking and lifting patients into their wheelchair. It's friendly design sparks positive associations for the user and makes it a lovable and useful robot.
Figure 5: ROBEAR demonstration
PEPPER[12] is a humanoid robot of medium size. It was introduced in 2014 and its primary function is to detect human emotion from facial expressions and intonations in ones voice. Initially, Pepper has been deployed to offices to function as a receptionist or host. As of 2017, the robot has been a focal point in research for assisting elderly in care homes.

Limitations and issues

There are several limitations that need to be taken into account designing the robot. The now active COVID-19 pandemic will definitely have an effect on the research. Especially since your target group are elderly people. They are most fragile against the virus and therefore physical meetings with our users are highly discouraged. This would have less effect if the elderly were not also the least tech-savvy meaning that online meetings with them might prove to be hard to realize.

That being said there are also other limitations to take into account. Ideally the chatbot and software implemented in the system has been developed especially for this project, so that it is able to help the user properly using the right actions. Designing such a chatbot requires many hours of work and expertise that is beyond the level we can deliver. Therefore a more simplistic chatbot will be designed that can be used in the proof of concept. The appearance just like the chatbot would need a custom designed appearance to function properly. In contrast to the chatbot however the group will spend its time creating the software of the robot at the cost of designing the hardware.

Emotional recognition is an important part for the robot to function, a limitation here might be that the current state of the art software will not be able to do so reliably. This will be researched.

R.E.D.C.A. design

R.E.D.C.A. must also have an initial design we wish to pursue as prototype. From what we researched and elaborated on in the State of the Art's previous work section, Pepper is the closest design resembling our goal, with state of the art sensors and actuators for speaking and listening. Our aim is to implement R.E.D.C.A. as such a robot.

Figure 6: Care robot Pepper

USE Aspects

Figure 7: Stakeholders in Facial Recognition Robot

Researching and developing facial recognition robots requires that one takes into account into what stakeholders are involved around the process. These stakeholders will interact, create or regulate the robot and may affect its design and requirements. The users, society and enterprise stakeholders are put into perspective below.

Users

Lonely elderly people are the main user of the robot. How these elderly people are found is via a government-funded institute where people can make applications in order to get such a robot. These applications can be filled in by everyone(elderly themselves, friends, physiatrists, family, caregivers) with the consent of the elderly person as well. In cases where the health of the elderly is in danger due to illnesses(Final stages dementia etc) the consent of the elderly is not necessary if the application is filled by doctors or physiatrists. If applicable a caregiver/employee of the institute will go visit the elderly person to check whether the care robot is really necessary or different solutions can be found. If found applicable the elderly will be assigned a robot. They will interact with the robot on a daily basis by talking to it and the robot will make an emotional profile of the person which it uses to help the person through the day. When the robot detects certain negative emotions(sadness, anger eg.) it can ask if it can help with various actions like calling family, friends, real caregivers, or having a simple conversation with the person.


Society

Society consists of three main stakeholders. The government, the medical assistance and visitors. The government is the primary funding for an institute that regulates the distribution of the emotion detecting robots. With this set-up, the regulation of the distribution is easier as the privacy violation of the real-time emotion detection is quite extensive. Furthermore, the government is accountable for making laws to regulate what the robots could do and how and what data can be sent to family members or various third-parties. Secondly, the robots may deliver feedback to hospitals or therapists in case of severe depression or other negative symptoms that can not simply solved by a simple conversation. The elderly person who still has autonomy as the primary value must always give consent for sharing data or calling certain people. For people with severe illnesses, this can be overruled by doctors or physiatrists to force the person to get help in the case of emergencies. Finally, any visiting individual may indirectly be exposed to the robot. To ensure their emotions are not unwillingly measured or privacy compromised, laws and regulations must be set up.

Enterprise

Robots must be developed, created and dispatched to the elderly. The relevant enterprise stakeholders, in this case, are the developing companies, the government, the hospitals and therapists to ensure logistic and administrative validity.

User Requirement

In order to develop R.E.D.C.A. several requirements must be conformed for it to function appropriately with its designated user. Requirements are split up in various sections relevant to the

Emotion Recognition

  • Recognise user face.
  • Calculate emotion for only the user's face.
  • Translate to useful data for chatbot.
  • Requirements above done in quick succession for natural conversation flow.

Chatbot

  • Engage the user or respond
  • High level conversation ability.
  • Preset options for the user to choose from.
  • Store user data to enhance conversation experience.
  • Able to call caregivers or guardians for emergencies.
  • Allows user to interact by speech or text input/output.

Hardware

  • Adaptable to traverse through an apartment floor.
  • Adjustable head height for ease of conversation/face recognition.
  • Lasting battery lifespan of a week.
  • Able to reset personal user data.

Miscellaneous

  • Able to store and transmit data for research if user opted-in.



Approach

This project has multiple problems that need to be solved in order to create a system / robot that is able to combat the emotional problems that the elderly are facing. In order to categorize the problems are split into three main parts:

Technical

The main technical problem faced for our robot is to be able to reliable read the emotional state of another person and using that data being able to process this data. After processing the robot should be able to act accordingly to a set of different actions.

Social / Emotional

The robot should be able to act accordingly, therefore research needs to be done to know what types of actions the robot can perform in order to get positive results. One thing the robot could be able to do is have a simple conversation with the person or start the recording of an audio book in order to keep the person active during the day.

Physical

What type of physical presence of the robot is optimal. Is a more conventional robot needed that has a somewhat humanoid look. Or does a system that interacts using speakers and different screens divided over the room get better results. Maybe a combination of both.

The main focus of this project will be the technical problem stated however for a more complete use-case the other subject should be researched as well.


Papers on Emotions, Creating the AI and creating the Chatbot

Survey

Implementing a social robot in daily life is not an easy task. Having assessed our USE stakeholders, the robot must now cater to their needs. In particularly the needs of the elderly - the primary user - as well as the caregivers and relatives. To gain more insight in their opinions on a social support robot such as R.E.D.C.A., the functionalities it should contain and its desired behaviors, a survey has been dispatched to our aforementioned demographics. Care homes, elderly and caregivers we know on a personal level as well as national ones willing to participate have been asked to fill a set of questions. There have been two sets of questions: Closed questions that ask the partaker their opinion, or level of agreement, of certain robot related statements: for example "the robot can engage the user in conversation". This level is within a range of Strongly Disagree (0%) to Strongly Agree (100%) with 50% being a grey area where the partaker does not know. The second set of questions are open ended and ask the user for their suggestion what they expect the robot to do when a theoretic user is in a specified emotional state.

Below are the resulting graphs by accumulating all survey data.

The data of first set of questions has been processed into a bar chart (figure 8), indicating average levels of agreement of the total population per statements. This data has also been grouped per occupation (figure 9) for a more specified overview of these partakers' opinions.

Figure 8: Survey Results - Total Population
Figure 9: Survey Results - Per Occupation


Additionally, a third graph has been made to display the preferred actions on certain emotional states of a theoretical user. As they were open questions, the answers ranged broadly from accurately specified to vaguely described. To be able to process this into the chart below (figure 10), generalisations on the open ended questions have been made, where multiple different answers are put in the same category. For example, if one partaker wrote "speak to user" and another "tell the user joke", this both has been categorised as Start conversation. Specific answers have still been taken into consideration upon creating the chatbot. Overall, their have been no great signs of disdain towards the statements proposed, which may be a signal to implement these.


Figure 10: Survey Results - Preferred Action on Emotion

With the survey's results, we may use this data to tinker and adapt R.E.D.C.A. and its chatbot functionality to its users needs. As the first charts indicate, many appreciate having the robot call for help if required as well as being able to call a person of interest Additionally, a large portion of the surveyed elderly are interested in having R.E.D.C.A. in their home, which stimulates our research and reinforces the demand for such a robot. As for the preferred action. Many found that asking questions is the way to go when people are in a sad or angry emotional state, but encourages the robot to remain idle when in a happy state.

This concludes the survey. All these findings have been considered and implemented into R.E.D.C.A.'s chatbot functionality as elaborated below.

Framework

In the coming section the total framework will be elaborated upon. In the picture below this framework can be seen in a flowchart. There are two inputs to the system. There is the input to the emotion that is expressed on the face of the user and there is the vocal input of the user. Due to limited time, the voice to text and text to speech was not implemented. However, the Pepper robot design already has a built-in text to speech and speech to text system, so this would not be required to be made again. How the individual modules work will be explained in the section below.


Figure 11: Framework of R.E.D.C.A.

Emotion recognition design

In order to determine the emotion of a person given an image in which the front of the face of that person is shown, two separate algorithms need to be used. The first one is to find the exact location of the face within the image (face detection). The second algorithm is used to determine which emotion (of a given set of emotions) is shown by the person's face (emotion recognition).

Face detection

Face detection is a computer technology that identifies human faces in digital images. It is a specific case of object-class detection where the goal is to find the locations and sizes of all objects in an image that belongs to a given class. Face detection focuses on the detection of frontal human faces. In our case, in order to be detected, the entire face must point towards the camera and should not be tilted to either side. Given that the detection is followed by a recognition step, these constraints on pose are quite acceptable in practice.

Haar Cascade is a machine learning object detection algorithm based on the concept of features proposed by Paul Viola and Michael Jones in their paper "Rapid Object Detection using a Boosted Cascade of Simple Features"[13] in 2001. The Viola-Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time. This, plus the fact that the algorithm is robust, i.e. it has a very high true-positive detection rate and very low false-positive rate, is the reason why it is so widely spread and why we choose to use it for our problem.

The algorithm requires a lot of positive images of faces and negative images without faces to train the classifier. There are four stages:

1. Haar Feature Selection

2. Creating an Integral Image

3. Adaboost Training

4. Cascading Classifiers

The first step is to extract the Haar Features. The three types of features used for this algorithm are shown below:

Figure 12: Haar features

Each Haar Feature is a single value obtained by subtracting the sum of pixels under the white rectangle from the sum of pixels under the black rectangle.

Each Haar Feature considers such adjacent rectangular regions at a specific location in the detection window. For example, one of the properties that all human faces share is that the nose bridge region is brighter than the eyes, so we get the following Haar Feature:

Figure 13: Human edge features

Another example comes from the property that the eye region is darker than the upper-cheeks:

Figure 14: Human line features

To give you an idea of how many Haar Features are extracted from an image, in a standard 24x24 pixel sub-window there are a total of 162336 possible features. After extraction, the next step is to evaluate these features. Because of the large quantity, evaluation in constant time is performed using the Integral Image image representation, whose main advantage over more sophisticated alternative features is the speed. But even so, it would be still prohibitively expensive to evaluate all Haar Features when testing an image. Thus, a variant of the learning algorithm AdaBoost is used to both select the best features and to train classifiers that use them.

AdaBoost constructs a “strong” classifier as a linear combination of weighted simple “weak” classifiers. Initially, a window of the target size is moved over the input image, and for each subsection of the image, the Haar Features are calculated. This difference is then compared to a learned threshold that separates non-faces from faces. Each Haar feature is only a "weak classifier, so a large number of Haar features are necessary to describe a face with sufficient accuracy. The features are therefore organized into cascade classifiers to form a strong classifier.

The cascade classifier consists of a collection of stages, where each stage is an ensemble of weak learners. The stages are trained using the Boosting technique which takes a weighted average of the decisions made by the weak learners and this way provides the ability to train a highly accurate classifier. A simple stage description of the Cascade classifier training is given in the figure below:

Figure 15: Cascade classifier training

After the training phase, the Cascade face detector is stored as an XML file, which can be then easily loaded in any program and used to get the rectangle areas of an image containing a face. The haarcascade_frontalface_default.xml[14] we use is a Haar Cascade designed by OpenCV, a library of programming functions mainly aimed at real-time computer vision, to detect the frontal face.

In videos of moving objects, such as the camera feed that we are using, usually one does not need to apply face detection to each frame as it is more efficient to use tracking algorithms. However since the emotion recognition algorithm is slower than face detection, we get frames less often which can affect the usage of tracking algorithms. Therefore we simply apply face detection at each processed frame.

Emotion recognition

Emotion recognition is a computer technology that identifies human emotion, in our case - in digital frontal images of human faces. The type of emotion recognition we use is a specific case of image classification as our goal is to classify the expressions on frontal face images into various categories such as anger, fear, surprise, sadness, happiness, and so on.

As of yet, there still isn’t one best emotion recognition algorithm as there is no way to compare the different algorithms with each other. However, for some emotion recognition datasets (containing frontal face images, each labeled with one of the emotion classes) there are accuracy scores for different machine learning algorithms trained on the corresponding dataset.

It is important to note that the state-of-the-art accuracy score of one dataset being higher than the score for another dataset does not mean that in practice using the first dataset will give better results. The reason for this is that the accuracy scores for the algorithms used on a dataset are based only on the images from that dataset.

The first dataset we attempted to train a model on was the widely spread Extended Cohn-Kanade dataset (CK+)[15] currently consisting of 981 images. We chose it based on the fact that the state-of-the-art algorithm for it had an accuracy of 99.7%[16][17], much higher than any other dataset. Some samples from the CK+ dataset are provided below:

Figure 16: CK+ dataset

On the CK+ dataset, we tried using the Deep CNN model called Xception[18] that was proposed by Francois Chollet. As in the article “Improved Facial Expression Recognition with Xception Deep Net and Preprocessed Images”[19] where they got up to 97.73% accuracy, we used preprocessed images and then used the Xception model for best results. However, even though accuracy wise we got very similar results, when we tried using the model in practice it was quite inaccurate, correctly recognizing no more than half of the emotion classes.

Then we tried the same model with another very widely spread emotion recognition dataset - fer2013[20]. This dataset, consisting of 35887 face crops, is much larger than the previous one. It is quite challenging as the depicted faces vary significantly in terms of illumination, age, pose, expression intensity, and occlusions that occur under realistic conditions. The provided sample images in the same column depict identical expressions, namely anger, disgust, fear, happiness, sadness, surprise, and neutral (the 7 classes of the dataset):

Figure 17: Fer2013 dataset

Using Xception with image preprocessing on the fer2013 dataset led to around 66% accuracy[21], but in practice, the produced model was much more accurate when testing it on the camera feed.

Based on the results of the previous two datasets, we assumed that probably bigger datasets lead to better accuracy in practice. We decided to try using Xception for one more emotion recognition dataset – FERG[22][23], a database of 2D images of stylized characters with annotated facial expressions. This dataset contains 55767 annotated face images, a sample shown below:

Figure 18: Xception dataset

The state-of-the-art for this dataset is 99.3%[24][25] and by using Xception with image preprocessing on it we got very similar results. However, even though this dataset was the biggest so far, in practice it performed the worst of the three. This is most likely due to the fact that the images are not of real human faces.

Based on all of the above-mentioned results we concluded that the best dataset to use in our case would be fer2013. Having chosen the dataset, naturally, we decided to use the state-of-the-art method[26][27] for it. That method is using VGG achieving 72.7% accuracy. Note that there is a method that achieves accuracy 76.82%[28] by using extra training data, but we decided not to use such methods.

The VGG architecture is similar to the VGG-B[29] configuration but with one CCP block less, i.e. CCPCCPCCPCCPFF:

Figure 19: VGG architecture

Note that in convolutional neural networks, C means convolutional layers which convolve the input and pass its result to the next layer, P is for pooling layers which reduce the dimensions of the data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer, and F is for the fully-connected layers, connecting every neuron in one layer to every neuron in another layer.

As shown above, the VGG network consists of 10 weight layers (the convolution and fully connected ones). All of the convolution layers apply a 3x3 kernel and contain a different number of channels for each CCP block. The first fully-connected layer after the last CCP block has 1024 channels. The second fully-connected layer is the one that performs 7-kind emotion classification, so it contains 7 channels. There is also a dropout after each block (ignoring neurons during the training phase of a certain set of neurons which is chosen at random) which improves the validation accuracy by around 1%.

Using the VGG method a model is generated and stored that can then be loaded and used by our program for the camera feed. As mentioned in the previous section, at each processed frame we apply face detection which provides us with a rectangle area containing the face. We can then easily crop the face. Further preprocessing includes converting the image to grayscale, employing histogram equalization for illumination correction and subtraction and division by the mean and standard deviation over all pixels. After preprocessing we input the image into the network and extract the probabilities for each emotion class to be the emotion shown by the person’s face.

When a person is talking and moving it is possible that processed frames output quite different probabilities and we don’t want to send to the chatbot distinctly different emotional states at each frame. For this purpose, we stabilize the emotional state and make it more consistent by outputting the average probabilities from the last n (10 in our case) processed frames. This way it will take a few seconds to switch to a different emotion.

Chatbot design

Once the facial data is obtained from the algorithm the robot can use this information to better help the elderly to fulfil their needs. How this chatbot will look like is described in the following section.

Why facial emotion recognition

For the chatbot design, there are a lot of possibilities to use different sensors to get extra data to use. However, more data is not equal to a better system. It is important to have a good idea what data will be used for what. For example, it is possible to measure the heart rate of the old person or track his/her eye movements, but what would it add? These are the important questions to ask when building such a chatbot. For our chatbot design there is chosen to use emotion recognition based on facial expressions to enhance the chatbot. The main reason for this is that experiments have shown that an empathic computer agent can promote a more positive perception of the interaction.[30] For example, Martinovski and Traum demonstrated that many errors can be prevented if the machine is able to recognize the emotional state of the user and react sensitively to it.[30] This is because knowing the state of a person can prevent the chatbot from annoying the person by adapting to the persons state. E.g. prevent pushing to keep talking when the person is clearly annoyed by the robot. This can breakdown the conversation and leave the person disappointed. If the chatbot can detect that the queries it gives make the persons' valance more negative it can try to change the way it approaches the person.

Other studies propose that chatbots should have its own emotional model, as humans themselves are emotional creatures. Research done on the connection of a person's current feelings towards the past by Lagattuta and Wellman [31] confirmed that past experiences may influence in certain situations. For example; a person who had experienced a negative event in the past will experience sad emotions while encountering a positive event in the present. This process is known as episodic memory formation. In generating a response, humans use these mappings. To create such mappings and generate responses, the robot’s memory should consist of the following components [31]

- Episodic Memory - Personal experiences

- Semantic Memory - General factual information

- Procedural Memory - Task performing procedures

From these three components, the emotional content is closely associated with episodic memory (long-term memory). The personal experience can be that of the robot or any user with whom the robot is associated. For the robot to speak successfully to the user, it should have its own episodic memory and its user memory. The study proposes a solution for robots to develop their own emotional memories to enhance response generation by;

- Identifying the user’s current emotional state.

- Identifying perception for the current topic.

- Identifying the overall perception of the user regarding the topic.

The chatbot uses two of these techniques. When the robot is operational, it not only continuously monitors the current state of the person but it also provides options to sense how the user feels about a subject.

What is Emotion

Before it is discussed what will be done with the recognized emotions, it will be quickly discussed what exactly emotions are and how an agent could influence how it feels. Everyone has some firsthand knowledge of emotion. However, precisely defining what it is in surprisingly difficult. The main problem is that "emotion" refers to a wide variety of responses. It can, for example, refer to the enjoyment of a nice quiet walk, guilt of not buying a certain stock or sadness watching a movie. These emotions differ in intensity, valance(positivity-negativity), duration and type(primary or secondary). Primary emotions are initial reactions and secondary emotions are emotional reaction to an emotional reaction. To exactly find the construct of emotion, researchers typically characterize several different features of a prototypical emotional response. Such a response has typical features, which may not be present in every emotional reaction. [32]

One feature of emotion is its trigger. This is a psychologically relevant situation being either external(e.g. seeing a sad movie) or internal(e.g. scared for an upcoming situation).

The second feature of emotion is attention. This means that the situation has to be attended to in order for an emotional to occur.

The third feature is the appraisal. In this feature, situations are appraised for their effect on one's currently active goals. Such goals are based on the values, cultural milieu, current situational features, societal norms, developmental stage of life and personality.[33] This means that there can be a difference in how people react to the same situations due to differing in their goals.

Once the situation has been attended to and appraised to the goals, this results in an emotional response which is the fourth aspect. This response can come in three ways. experiential, behavioural and central and peripheral physiological reactions. The experiential reaction is mostly described as "feeling" but this is not equivalent to emotion. Emotions can also influence the behaviour of a person. Think of smiling or eye-widening. Finally, emotions can also make us do something(e.g. fleeing, laughing). The final feature of emotion is its malleability. This means that once the emotional response is initiated the precise path is not predefined and can be manipulated from external inputs. This is the most crucial aspect as this would mean that there is a possibility for regulating the emotions of an agent. These five aspects together form a sequence of actions where a response to a situation is generated, but this response will then also influence the situation again. [32]

Regulating Emotion

Now it is known how emotions can be described it is important to know how these emotions can then be manipulated so that an agent will feel less bad. As stated before emotions can be manipulated. This can be done intrinsically or extrinsically. For this research especially the extrinsic part is of importance as this is where the chatbot can influence the person's emotional state. There are generally five families of emotion regulatory strategies. Each of which will be described below.

-Situation selection: choosing whether or not to enter potentially emotion-eliciting situations.

-Situation modification: change the current situation to modify its emotional impact.

-Attentional deployment: Choosing aspects of situations to focus on.

-Cognitive change: Changing the way one constructs the meaning of the situation.

-Response modulation: Once emotions have been elicited it can that the emotional response is influenced.

The goal of the robot is then to regulate the emotions of the elderly in such a way that they are more in line with the goal of the elderly person itself. To do this the robots needs to know whether the situation is controllable or not. For example, when a person is feeling a little bit sad it can try to talk to the person to change its feeling to a more positive feeling. However, when the situation is not controllable the robot should not try to change the emotion of the elderly, but help the person to accept what is happening to them. For example, when the housecat of the person just died, this can have huge impacts on how the elderly person is feeling. Trying to make them happy will not happen of course due to an uncontrollable situation. In this case, the robot can only try to make the duration and intensity of the negative emotion less by trying to comfort them.

It is of course not possible to know the exact long term goals of the elderly person. It is therefore important that the robot is able to adapt to different situations so that it can adapt how it reacts based on the persons long term goals. This is of course not possible in this project due to time and resource constraints. [32]

Robot user interaction

In this section, the interaction between the elderly and the robot will be described in detail so that it is clear what the robot needs to do once it has the valance and arousal information from the emotion recognition. As described in the robot design section the robot will be around 1.2 meters high with the camera module mounted on the top to get the best view of the head of the person. The robot will interact with the user with a text to speech module, as having a voice will convey the message of the robot better than just displaying it. The user can interact with the robot by speaking to it. With a speech to text module, this speech will be converted to a text input that the robot can use. The robot will interact with the user with its main goal to make sure the emotions the person is feeling are according to their long-time goals. It will do this by making smart response decisions. This decision is also based on the arousal of the emotion, as such high arousal emotions(e.g. angry) may also result in a deviation of the user from their long-time goal. As stated before implementing individual long-term goals is not doable for this project, but this may also be researched in further projects. Instead, the main focus will be on the following long-term goal:

'Have positive valance'

This is chosen as everyone would want to feel happy and live healthily no matter the age or ethnicity. This means that the robot will try to keep the person on this long time goal as much as it can by saying or doing the right things. It will do this according to the emotion regulation methods described in the Regulating Emotion subsection. These techniques are the core of how the chatbot will work as these are also used by humans to regulate their own emotions. Now not all the described methods are applicable for the robot due to its limitation in intelligence. It is, for example, difficult to do situation selection as this requires knowledge of the future. Also, response modulation is difficult to do as this has more to do with the internal state of the user. Finally, attentional deployment means that the robot should understand a lot of different situations and positive aspects of these situations as well. This is also not possible in this project. However, situation modification and Cognitive Change are possible to do with the robot. How this will be implemented will be explained in the next subsections.

Situation modification

The user can come in a lot of different situations that may divert them from their long-time goal. The robot should try to prevent such situations as much as it can and instead try to steer the user to a good situation. To put it more concretely the goal of the user is to have a positive valance. This means that if the emotion recognition detects that the person has a small negative valence, it may try a few different things. The robot can say to the user: "Would you like to do something?" Where the user can answer what they like, but when the user accepts the robot can suggest a few things it can do like, calling family/friends, having a simple conversation about a different subject in order to make the person happier.

Cognitive Change

This technique tries to modify how the user constructs the meaning of a specific situation. This technique is most applicable in highly aroused situations because these are most of the time uncontrollable emotions. This uncontrollability means that the robot should try to reduce the intensity and duration of the emotion by Cognitive change. As an example, during a very high aroused state the robot should try to comfort the person by asking questions and listening to the user to try to change how the user constructs this bad emotion into a less intense and higher valence emotion.

Needs and values of the elderly

The utmost important part of the robot is to support and help the elderly person as much as it can. This has to be done correctly however, as crossing ethical boundaries can easily be made. The most important ethical value of an elderly person is their autonomy. This autonomy is the power to make your own choices. [34] For most elderly this value has to be respected more than anything. It is therefore vital that in the chatbot design the elderly person is not forced to do anything. The robot can merely ask. From this also problems can arise if the person asks the robot to do something to hurt him/her. But for this robot design, it is not able to do anything physical that could hurt or help the person. Its most important task is the mental health of the person. Such a robot design is like the PARO robot, a robotic seal that helps elderly with dementia to not feel lonely. The problem of this robot design is that it can not read the persons emotion, as it is treated as a hugging robot, meaning that sensing the head is very difficult. Therefore for this project, different robot designs were made. This can be seen in the other section.

Implementation of facial data in chatbot

Once the facial recognition software has gathered information about the emotional state of the person this information has to be implemented somehow. In this section, there will be discussed how the state of the person will help the robot to personalize how it reacts to the person.

Data to use

The recognition software will detect both whether there is a face in view, together with what emotion is expressed and with what intensity. This data can then be used to determine how positive or negative the person is feeling. These negative emotions are very important to detect, as prolonged negative feelings may lead to depression and anxiety. Recognizing these emotions timely can prevent this, negating help needed from medical practitioners.[35] For our model the following outputs will be used:

- Is there a face in view, Yes/No

- How probable are the different emotions, 0 to 1 for every emotion.

Emotional Model

One very important thing is what kind of emotional interpreter is being used, as there are many different models for describing complex emotions. The most basic interpretation uses six basic emotions that were found to be universal between cultures. This interpretation was developed during the 1970s, where psychologist Paul Eckman identified these six basic emotions to be happiness, sadness, disgust, fear, surprise, and anger.[36] As the goal of the robot is to help elderly people through the day, it probably will not be necessary to have a more complex model as it is mostly about the presence of negative feelings. The further classification of what emotion is expressed can help with finding a solution for helping the person. These six basic emotions will be represented by the Circumplex Model. In this model emotional states are represented on a two-dimensional surface defined by a valence (pleasure/displeasure) and an arousal (activation/deactivation) axis. The two-dimensional approach has been criticized on the grounds that subtle variations between certain emotions that share common core affect, e.g., fear and anger, might not be captured in less than four dimensions.[37] Nevertheless, for this project the 2-dimensional representation will suffice as the robot will not need to be 100% accurate in its readings, as this project uses a certain part of the circle for determining what action to undertake. Meaning that if the emotional state is slightly misread the robot still goes in the right trajectory. Such a valence/arousal diagram looks like the following:

Figure 19: Valence and arousal diagram[38]

How negative or positive a person is feeling can be expressed by the valance state of the person. This valance is the measurement of the affective quality referring to the intrinsic good-ness(positive feelings) or bad-ness(negative feelings). For some emotions, the valance is quite clear eg. The negative effect of anger, sadness, disgust, fear or the positive effect of happiness. However, for surprise it can be both positive and negative depending on the context.[39] The valance will be simply calculated with the intensity of the emotion and whether it is positive or negative. When the results are not as expected the weight of different emotions can be altered to better fit the situation.

Arousal is a measure of how active or passive the person is. This means for example that a person that is very angry has a very high arousal or that a person who is feeling sad has low arousal. This extra axis will help to better define what the robot should do as with high arousal a person might panic or hurt someone or themselves. Where exactly the values for these six emotions lay differs from person to person, but the general locations can also be seen from the graph above.

Implementation of valance and arousal

Once the presence of every emotion has been detected and whether the individual is feeling positive or negative and with what arousal, the robot can use this information. How to use this is very important as history has shown that if the robot does not meet the needs and desires of the potential users the robot will be unused. Therefore based on the survey, there was chosen to have three different trajectories for which the chatbot will have responses. These three can be seen below in the picture:

Figure 20: Trajectories in Valence and arousal diagram

Trajectory 1 is where the user has a quite arousal and low valence. How to deal with these emotions is bases on the survey results and literature[40]. From this, the most important factor for dealing with these strong emotions was to listen to the person and ask questions about why he or she is feeling angry. Therefore the main thing the robot does is ask if something is bothering the person and if it can help. If the user confirms it will immediately get into a special environment where a psychiatric chatbot was set up. This chatbot is based on the ELIZA bot with additional functionalities added from the ALICE bot. The ELIZA bot is quite old but does exactly what the users want: listen and ask questions. It works by rephrasing the input of the user in such a way that it evokes a continuous conversation.[41]. An example of user interaction can be seen in the figure below:

Figure 21: Example interaction Trajectory 1

Trajectory 2 is when the person is sad with low to neutral arousal. From the survey and literature, it became clear that during this state there are a lot more things the robot should be able to do. In the survey, people suggested that the robot should also ask questions, but also It should be able to give some physical exercises, tell a story and a few other things. These have all been implemented in this trajectory. An interaction in there could look the following:

Figure 22: Example interaction Trajectory 2(1/2)
Figure 23: Example interaction Trajectory 2(2/2)

Trajectory 3 is when the person has high arousal with a slight negative valence. This is in between the first two trajectories in terms of what the robot does. Like trajectory 1 it is important that talking and listening to the user is very important. However, due to the higher valence, talking and listening is not always wanted. Therefore the robot should first ask if the person wants to talk about there feeling or if they would want to do something else. If the person wants to do something else all the options the robot can do will be listed just like trajectory 2. An interaction could look like the following:

Figure 24: Example interaction Trajectory 3(1/2)
Figure 25: Example interaction Trajectory 3(2/2)

Chatbot implementation

The chatbot is build using Visual studio 2019 with SIML code. SIML is an adaption of the XML language specifically made for building chatbots. The chatbot gets the emotions from the emotion recognition software, with which it generates an appropriate response according to the literature and survey results.

SIML

As stated before SIML is a special language developed for chatbots. SIML stands for Synthetic Intelligence Markup Language [42] and has been developed by the Synthetic Intelligence Network. It uses input patterns that the user says or types and uses this to generate a response. These responses are generated based on the emotional state of the person and the input of the user. This is best explained with an example.

Figure 25:Example of an SIML model

In this example, one model is shown. Such a model houses a user input and robot response. Inside this model, there is a pattern tag. This tag specifies what the user has to say to get the response that is stated in the response tag. So, in this case, the user has to say any form of yes to get the response. This yes is between brackets to specify that there are multiple ways to say yes(e.g. yes, of course, sure etc.). The user says yes or no based on a question the robot asked. To specify what this question was, the previous tag is used. Here the previous response of the robot is used to match when the user wants to interact with the robot. So in short, this code makes sure that when the user says any form of yes on a question it will give a certain response. The chatbot consists of a lot of these models to generate an appropriate response. However, there is no implementation of emotions shown in this example. How the robot deals with different emotions will be explained in the next example.

Figure 26:Emotion selection of scenarios

This model is the first interaction of the robot with the user. This model does not use user input, but a specific keyword used to check the emotional state of the person. This means that every time the keyword AFGW is sent to the robot it will get into this model. It can be seen that in the response there are 4 different scenarios the robot can detect. These are separated by various if-else statements. These 4 scenarios are the following:

- Positive valence

- Negative valence and low arousal(Trajectory 2)

- Small negative valence and high arousal(Trajectory 3)

- High negative valence and high arousal(Trajectory 1)

In each of these scenarios, there is a section between think tags that is executed in the background. In this case, the robot sends the user to a specific concept. In a concept, all the response specifically for that situation are located. So when the robot detects that the user has low arousal but negative valence it will send him into a specific concept to deal with this state. In these concepts, the methods described above are implemented.

Integration

In this case we discuss how the emotion recognition is integrated in the chatbot, how the communication happens, how everything comes together and the user interface.

Figure 15: CK+ dataset

Emotion recognition

One of the main components that are displayed when someone runs our program is an image consisting of a frame from the camera feed and two diagrams representing the emotional state based on the face expression. This image is updated as soon as possible, at each processed frame. Frames are processed one at a time, so, due to time constraints, some are omitted.

At each process frame, we first use face detection on it, which gives us the coordinates of a rectangle surrounding the person’s face. We then draw that rectangle on the frame and the resulting image is the middle part of the displayed image.

Then, after preprocessing the frame as previously described, we input it into the emotion recognition algorithm. From the output from the emotion recognition we extract the probabilities for each emotion class, as explained in the emotion recognition section. This output is used to make the bar chart which is the left part of the displayed image.

Finally, for each emotion class, we multiply the probability by the corresponding default valence and arousal values for that class in the valence/arousal diagram, then we sum all the resulting values to compute an estimation of the valence and arousal values of the person. These values are then translated into coordinates in the valence/arousal diagram. The right part of the displayed image consists of a copy of the valence/arousal diagram with a red circle drawn at the computed coordinates, i.e. the circle represents the valence and arousal values of the person.

Chatbot

We want the emotion recognition to work simultaneously with the chatbot. For the sake of the chatbot project starting automatically when we run the main Python script, we use the executable file produced by Visual Studio the last time the project was built. What we do is, we run the continuous emotion recognition in one thread and we execute the executable file in a separate thread. The way the emotion information is sent to the chatbot is done by writing the valence/arousal values to a comma-separated values file at each processed frame, which is then read by the chatbot.

Conversation

Firstly, we need a nice chat-like user interface. We use TkInter[43], de-facto Python's standard Graphical User Interface package. The chat is displayed right below the above mentioned image. All messages between the chatbot and the user are displayed in the chat window.

If the user wants to send a message to the chatbot, he types it in the text field at the bottom of the window and then presses the “Send” button. Afterwards, the message is not only displayed in the chat but is written to a file, which then the chatbot reads. Note that since two processes can’t access the same file simultaneously, special attention has been paid to synchronizing these communications.

Each time the robot reads the user’s message, it generates a corresponding output message and writes it to a file. In order for our main program to read that file, we have made a separate thread which constantly checks the file for changes. Once a change has been detected, the new message is read and displayed in the chat window for the user to see.

Consequences of erroneous emotion assessments

For its design, R.E.D.C.A.'s accuracy regarding emotion recognition is on par with the current state of the art CNN's. However, this accuracy is not without flaws and as of such a small percentage of error must not be neglected. Errors made by the robot have generally been tolerated by the surveyed group, but regardless of that it is important to elaborate on the effects an mistaken emotion may have on their user. The false positive and negatives for the three main emotions, sadness, anger and joy are observed. Aside from the confusion a wrong assessment may bring, it may bring other consequences for the user.

Sadness

False Positive: When the user is not in a sad state, engaging them with the chatbot's sadness routine results in a reduction of happiness. Regarding happiness, the sadness subroutine may fail and increase their anger by asking too many questions. For other emotional states, the sadness subroutine may not be as ineffective, as it has been created to be supportive to the user.

False Negative: When another subroutine is applied for a sad user, they may feel resentment for the robot as it does not recognise their sadness and does not function as the support device they expect it to be. As a follow up on repetitive false negatives, the robot may be harmed or disowned by the user.

Anger

False Positive: When a user isn't angry but the routine is triggered nonetheless, R.E.D.C.A. may become inquisitive towards the user and try to calm them down. However, if the user is already calm, this routine will end shortly and only alters the user's state marginally.

False Negative: Not engaging an angry user with the anger subroutine may cause escalation of the user's emotion, as they are not treated calmly and with the correct questions. If the chosen subroutine is happy, this may only frustrate the user more and put stress and harm on the user as well as danger to the robot.

Joy

False Positive As stated above, a happy subroutine may inflict more harm than good when the user's arousal and valence levels are quite negative. Being happy around angry people may further escalate their emotions. However, it is likely that a happy subroutine may assist in distracting a sad user of their sadness and increase their valence level instead.

False Negative The robot will show worried and inquisitive towards a happy user, which may reduce their overall happiness.

Scenarios

This section is dedicated to portraying various scenarios. These scenarios elaborate on moments in our theoretical user’s lives where the presence of our robot may improve their emotional wellbeing. They also demonstrate the steps both robot and user may perform and consider edge cases.


Scenario 1: Chatty User

Our user likes chatting and prefers being accompanied. Unfortunately, they cannot have human company at all their preferred times. In this case, the user may engage with their robot. It will recognise an activation phrase (e.g. “Hey Robot”) and listens to their user’s demand, which is similar to Smart Home devices nowadays. The robot cooperates the recently registered emotion with the questions and phrases spoken to assess appropriate responses. In this case, the expected response is “I wish to talk”, which prompts the robot to prioritise to listen to the user, analysing their phrases. On cues such as pauses or order of the user, the robot may respond in questions or statements generated from the analysed lines. For example, the user talks about their old friend “Angie”. The robot processes that this is a human with attributes and may ask questions about “Angie”. This information may be stored and used again for future conversation. The final result of this is that the user won’t have to reduce their emotional wellbeing by being alone, having a chat companion to fill this necessity.

Scenario 2: Sad Event User

The user has recently experienced a sad event, for example a death of a friend. It is assumed that the user will express more emotions on the negative valence spectrum, including sadness in the timespan after the occurred event. The robot present in their house is designed to pick up on the changed set and frequency of emotions. Whenever the sad emotion is recognised, the robot may engage the user by demanding attention and the user is given a choice to respond and engage a conversation. To encourage the user from engaging, the robot acts with care, applying regulatory strategies, tailoring the question to the user’s need. Once a conversation is engaged, the user may open themselves to the robot and the robot asks questions as well as comforting the user by means attentional deployment. Throughout, the robot may ask the user if they would like to contact a registered person (e.g. personal caregiver or family), whether a conversation has not been engaged or not. This allows for all possible facets of coping to be reachable for the user. In the end, the user will have cleared their heart, which will help in improving sadness.

Scenario 3: Frustrated User

A user may have a bad day. Something bad has happened that has left them in a bad mood. Therefore, their interest in talking with anyone, including the robot has been diminished. The robot has noticed their change in their facial emotions and must alter their approach to reach their goal of conversation and improved wellbeing. To avoid conflict and the user lashing out on the harmless robot, it may engage the user by diverting their focus. Rather than have the user’s mind set on what frustrated them, the robot encourages the user to talk about a past experience they’ve conversed about. It may also engage a conversation themselves by stating a joke, in an attempt to diffuse the situation. When the user still does not wish to engage, the robot may ask if they wish to call someone. In the end, noncompliant users may need time and the robot may best not further engage.

Scenario 4: Quiet User

Some users may not be as vocal than others. They won’t take initiative to engage the robot or other humans. Fortunately, the robot will frequently demand attention, asking the user if they wish to talk. It does so by attempting to engage in a conversation at appropriate times (e.g. between times outside of dinner and recreational events where the user is at home). If the robot receives no input, it may ask again several times in short intervals. In case of no response, the robot may be prompted to ask questions or tell stories on their own, until the user tells it to stop or – preferably – engages with the robot.


Conclusion

As a conclusion for this project, it can be stated that a complete framework for a care robot with emotion recognition and chatbot functionality has been made. The framework was made with extensive literature research about human emotions, facial expressions, state of the art software/hardware and user needs. The chatbot meets most of the user requirements. It is able to respond to textual input from the user. It is however not able to do this with speech input due to time limitations. The chatbot also has some high-level conversation ability with the necessary steps to call for help when needed. Furthermore, the chatbot has been designed for one specific case so there is no preset option available as stated in the user requirements. It is, however, able to store user data for later use due to the nature of SIML. Data such as name, age, gender, dog name can be integrated in the framework for example. Furthermore, the chatbot is able to react differently based on the detected emotional state using three trajectories. Each trajectory has been specifically designed to handle the current emotional state using survey feedback and literature as a guideline.


Discussion

The chatbot now consists of around 300 different SIML models that are able to handle different user input. This sounds like a lot but for a great user interaction, this would have to be scaled to roughly ten times the size. This would have to be done by letting people talk to the chatbot to learn what inputs are not correctly tagged. Furthermore, the chatbot now has three different trajectories that give a certain response, but this can, of course, be expanded so that the chatbot can return an even more sophisticated response. Also, the recognized emotion can be used more times in the chatbot as it now is mainly used to get into a specific trajectory. The option has not been explored yet to use emotion change within a trajectory itself. It is, for example, possible that in trajectory 2 the person gets angry because of something the robot says. This should also be addressed by the chatbot in the future.

Github link

The code used to make R.E.D.C.A. can be found here: https://github.com/CPA-Aarts/0LAUK0-Group6

Milestones

Week Milestone
1 (20.04 - 26.04) Subject chosen
2 (27.04 - 03.05) Project initialised
3 (04.05 - 10.05) Facial/Emotional recognition research finalised
4 (11.05 - 17.05) Facial/Emotional recognition software developed
5 (18.05 - 24.05) Chatbot research finalised
6 (25.05 - 31.05) Chatbot implemented
7 (01.06 - 07.06) Facial/Emotional recognition software integrated in Chatbot
8 (08.06 - 14.06) Wiki page completed
9 (15.06 - 21.06) Chatbot demo video and final presentation completed
10 (22.06 - 28.06) N/A
11 (29.06 - 05.07) N/A

Logbook

Week 1

Name Total hours Tasks
Rick 3.5 Introduction lecture [1.5], meeting [1], literature research [0.5]
Coen 3 Introduction lecture [1.5], meeting [1], literature research [1]
Max 5.5 Introduction lecture [1.5], meeting [1], literature research [3]
Venislav 5 Introduction lecture + meetings [3], literature research [2]

Week 2

Name Total hours Tasks
Rick 3 Meeting [1], Research [2]
Coen 4 Meeting [2], Wiki editing [.5], Google Forms implementation, [.5], Paper research [1]
Max 8 Meeting [1], literature research [1], rewriting USE[2], writing Implementation of facial data in chatbot [4]
Venislav 7 Meeting [2], researching face detection and emotion recognition papers[5]

Week 3

Name Total hours Tasks
Rick 2 Reading / Research [2]
Coen 3 Literature[2] Writing Requirements[1]
Max 4 Searched for relevant papers about chatbots and how to implement them[4]
Venislav 6 Researching face detection and emotion recognition frameworks[6]

Week 4

Name Total hours Tasks
Rick 4 Wiki (re)writing problem statement & introduction [2], Meeting [2]
Coen 9 Literature about robot design, user requirements and USE groups[5]. Editing USE and Requirements[2]. Working on robot design[2].
Max 12 Literature study about Valance/Arousal diagram[2]. Finding information about goal of emotion detection[4], Finding writing about emotion regulation and general emotion[6]
Venislav 13 Trying multiple methods for emotion recognition[12], meeting [1]

Week 5

Name Total hours Tasks
Rick 5 Meeting [2], Running emotional recogniton simulations [3]
Coen 8 Literature[5] Writing Requirements & Scenarios [2] Meeting[1]
Max 9 Making framework[1] Reading paper about emotion[3] thinking about implementation of chatbot[4] meeting[1]
Venislav 14 Implementing real-time emotion recognition[12], meeting[2]

Week 6

Name Total hours Tasks
Rick 8 Creating & distributing survey [6], meetings [2]
Coen 9 chatbots: research, documentation and installation[7], meetings[2]
Max 20 Making name for project[0.5] Trying and making framework for chatbot[17] meetings[2.5]
Venislav 10 Implementing multiple datasets to work with xception and training models [8], meetings[2]

Week 7

Name Total hours Tasks
Rick
Coen
Max 25 Work on SIML[17], Update wiki[4], work on framework[3], meetings[1]
Venislav 8 Implementing final STAT version of emotion recognition[7], meetings[1]

Week 8

Name Total hours Tasks
Rick
Coen
Max 19 Work on visual studio[14], work on SIML[4] meeting[1]
Venislav 24 Computing and sending arousal/valence values to chatbot[3], Designing User Interface[20], meeting[1]

Week 9

Name Total hours Tasks
Rick 16 Meetings [4], Wiki-editing [4], Writing patterns for chatbot [4] Presentation (also editing) [4]
Coen 23 chatbot research[4]. chatbot concept implementation[5], presentation[4], meetings[2], wiki[8]
Max 32 Making final framework for chatbot[14], make patterns for chatbot[13], make demo[5],
Venislav 25 Integrating chatbot with emotion recognition[20], Working on Demo + discussions[5]

Week 10

Name Total hours Tasks
Rick 2 Wiki work [2]
Coen 3.5 Meeting[2] Wiki[1.5]
Max 10 Finalize wiki part chatbot[10]
Venislav 20 Wiki work (everything face/emotion recognition, integration with chatbot) + discussions [20]

Peer Review

Delta Score
Rick - 0.4
Coen - 0.2
Max + 0.3
Venislav + 0.3
Total average 0

References

  1. World Health Organisation. who.int. Ageing and Health. https://who.int/news-room/fact-sheets/detail/ageing-and-health
  2. Misra N. Singh A.(2009, June) Loneliness, depression and sociability in old age, referenced on 27/04/2020
  3. Green B. H, Copeland J. R, Dewey M. E, Shamra V, Saunders P. A, Davidson I. A, Sullivan C, McWilliam C. Risk factors for depression in elderly people: A prospective study. Acta Psychiatr Scand. 1992;86(3):213–7. https://www.ncbi.nlm.nih.gov/pubmed/1414415
  4. Rukuye Aylaz, Ümmühan Aktürk, Behice Erci, Hatice Öztürk, Hakime Aslan. Relationship between depression and loneliness in elderly and examination of influential factors. Archives of Gerontology and Geriatrics. 2012;55(3):548-554. https://doi.org/10.1016/j.archger.2012.03.006.
  5. Zhentao Liu, Min Wu, Weihua Cao, Luefeng Chen, Jianping Xu, Ri Zhang, Mengtian Zhou, Junwei Mao. A Facial Expression Emotion Recognition Based Human-robot Interaction System. IEEE/CAA Journal of Automatica Sinica, 2017, 4(4): 668-676 http://html.rhhz.net/ieee-jas/html/2017-4-668.htm
  6. Saleh, S.; Sahu, M.; Zafar, Z.; Berns, K. A multimodal nonverbal human-robot communication system. In Proceedings of the Sixth International Conference on Computational Bioengineering, ICCB, Belgrade, Serbia, 4–6 September 2015; pp. 1–10. http://html.rhhz.net/ieee-jas/html/2017-4-668.htm
  7. State-of-the-Art Facial Expression Recognition, https://paperswithcode.com/task/facial-expression-recognition
  8. http://www.square-bear.co.uk/mitsuku/home.htm
  9. Vincentian Collaborative System: Measuring PARO's impact, with an emphasis on residents with Alzheimer's disease or dementia., 2010
  10. Huisman, C., Helianthe, K., Two-Year Use of Care Robot Zora in Dutch Nursing Homes: An Evaluation Study, 2019, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6473570/
  11. Olaronke, I., Olawaseun, O., Rhoda, I., State Of The Art: A Study of Human-Robot Interaction in Healthcare 2017, https://www.researchgate.net/publication/316717436_State_Of_The_Art_A_Study_of_Human-Robot_Interaction_in_Healthcare
  12. Pandey, A., Gelin, R., A Mass-Produced Sociable Humanoid Robot Pepper: The First Machine of Its Kind, 2018
  13. Viola, Paul & Jones, Michael. (2001). Rapid object detection using a boosted cascade of simple features. Comput. Vis. Pattern Recog. 1.
  14. haarcascade_frontalface_default.xml, https://github.com/opencv/opencv/tree/master/data/haarcascades
  15. Lucey, Patrick & Cohn, Jeffrey & Kanade, Takeo & Saragih, Jason & Ambadar, Zara & Matthews, Iain. (2010). The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW 2010. 94 - 101. 10.1109/CVPRW.2010.5543262.
  16. Meng, Debin & Peng, Xiaojiang & Wang, Kai & Qiao, Yu. (2019). Frame Attention Networks for Facial Expression Recognition in Videos. 3866-3870. 10.1109/ICIP.2019.8803603.
  17. State-of-the-Art Facial Expression Recognition on Extended Cohn-Kanade Dataset, https://paperswithcode.com/sota/facial-expression-recognition-on-ck
  18. Chollet, Francois. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. 1800-1807. 10.1109/CVPR.2017.195.
  19. Mendes, Maksat,. (2019). Improved Facial Expression Recognition with Xception Deep Net and Preprocessed Images. Applied Mathematics & Information Sciences. 13. 859-865. 10.18576/amis/130520.
  20. fer2013 dataset, https://www.kaggle.com/deadskull7/fer2013
  21. Arriaga, Octavio & Valdenegro, Matias & Plöger, Paul. (2017). Real-time Convolutional Neural Networks for Emotion and Gender Classification.
  22. Facial Expression Research Group 2D Database (FERG-DB), http://grail.cs.washington.edu/projects/deepexpr/ferg-2d-db.html
  23. Aneja, Deepali & Colburn, Alex & Faigin, Gary & Shapiro, Linda & Mones, Barbara. (2016). Modeling Stylized Character Expressions via Deep Learning. 10.1007/978-3-319-54184-6_9.
  24. Minaee, Shervin & Abdolrashidi, Amirali. (2019). Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network.
  25. State-of-the-Art Facial Expression Recognition on FERG, https://paperswithcode.com/sota/facial-expression-recognition-on-ferg
  26. Pramerdorfer, Christopher & Kampel, Martin. (2016). Facial Expression Recognition using Convolutional Neural Networks: State of the Art.
  27. State-of-the-Art Facial Expression Recognition on FER2013, https://paperswithcode.com/sota/facial-expression-recognition-on-fer2013
  28. Goodfellow, Ian & Erhan, Dumitru & Carrier, Pierre & Courville, Aaron & Mirza, Mehdi & Hamner, Ben & Cukierski, Will & Tang, Yichuan & Thaler, David & Lee, Dong-Hyun & Zhou, Yingbo & Ramaiah, Chetan & Feng, Fangxiang & Li, Ruifan & Wang, Xiaojie & Athanasakis, Dimitris & Shawe-Taylor, John & Milakov, Maxim & Park, John & Bengio, Y.. (2013). Challenges in Representation Learning: A Report on Three Machine Learning Contests. Neural Networks. 64. 10.1016/j.neunet.2014.09.005.
  29. Simonyan, Karen & Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556.
  30. 30.0 30.1 U. K. Premasundera and M. C. Farook, "Knowledge Creation Model for Emotion Based Response Generation for AI," 2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka, 2019, pp. 1-7, doi: 10.1109/ICTer48817.2019.9023699. Cite error: Invalid <ref> tag; name "“Emotion4”" defined multiple times with different content
  31. 31.0 31.1 K. Lagattuta and H. Wellman, "Thinking about the Past: Early Knowledge about Links between Prior Experience, Thinking, and Emotion", Child Development, vol. 72, no. 1, pp. 82-102, 2001. Available: 10.1111/1467-8624.00267 Cite error: Invalid <ref> tag; name "“Emotion5”" defined multiple times with different content
  32. 32.0 32.1 32.2 Werner, K., Gross, J.J.: Emotion regulation and psychopathology: a conceptual framework. In: Emotion Regulation and Psychopathology, pp. 13–37. Guilford Press (2010)
  33. Lazarus, R. S. (1966). Psychological stress and the coping process. New York: McGrawHill.
  34. Johansson-Pajala, R., Thommes, K., Hoppe, J.A. et al. Care Robot Orientation: What, Who and How? Potential Users’ Perceptions. Int J of Soc Robotics (2020). https://doi.org/10.1007/s12369-020-00619-y
  35. Maja Pantic and Marian Stewart Bartlett (2007). Machine Analysis of Facial Expressions, Face Recognition, Kresimir Delac and Mislav Grgic (Ed.), ISBN: 978-3-902613-03-5, InTech, Available from: http://www.intechopen.com/books/face_recognition/machine_analysis_of_facial_expressions
  36. Ekman P.(2017, August) My Six Discoveries, Referenced on 6/05/2020.https://www.paulekman.com/blog/my-six-discoveries/
  37. Marmpena, Mina & Lim, Angelica & Dahl, Torbjorn. (2018). How does the robot feel? Perception of valence and arousal in emotional body language. Paladyn, Journal of Behavioral Robotics. 9. 168-182. 10.1515/pjbr-2018-0012.
  38. Gunes, H., & Pantic, M. (2010). Automatic, Dimensional and Continuous Emotion Recognition. International Journal of Synthetic Emotions (IJSE), 1(1), 68-99. doi:10.4018/jse.2010101605
  39. Maital Neta, F. Caroline Davis, and Paul J. Whalen(2011, December) Valence resolution of facial expressions using an emotional oddball task, Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3334337/#__ffn_sectitle/
  40. Glancy, G., & Saini, M. A. (2005). An evidenced-based review of psychological treatments of anger and aggression. Brief Treatment & Crisis Intervention, 5(2).
  41. Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (Jan. 1966), 36–45. DOI:https://doi.org/10.1145/365153.365168
  42. Synthetic Intelligence Network, Synthetic Intelligence Markup Language Next-generation Digital Assistant & Bot Language. https://simlbot.com/
  43. tkinter - Python interface to Tcl/Tk, https://docs.python.org/3/library/tkinter.html