PRE2024 3 Group8: Difference between revisions

From Control Systems Technology Group
Jump to navigation Jump to search
(→‎Discussion: many changes in wording)
 
(59 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Margot Dijkstra, Llywelyn Vrouenraets, Sem Schreurs, Vladis Michail, Alessia-Maria Postelnicu, Sebastian Ciulacu
==Robot-assisted vocabulary learning==
Approach, milestones and deliverables can be found on the first page of the [https://tuenl-my.sharepoint.com/:x:/g/personal/v_michail_student_tue_nl/ESX_K6KzkbdCi65KJhRhZZsBclC4tfsQ2a7p3e0g6pCuIQ?e=aJpRHP logbook] called Deadlines


=== <big><u>Abstract</u></big> ===
In this paper, we conducted research into robot-assisted novel vocabulary learning to investigate whether the differences in the tutoring agent's embodiment would result in differences in performance on a language-based retention test. This study examines the factors that may contribute to higher retention scores, including experienced cognitive load, the perceived social presence of the robot, and general impressions of the robot. A within-subject design was implemented for this investigation, and 21 participants were recruited to participate in a vocabulary learning task, where words from an artificial language - Vimmi were used. The participants were asked to learn both from a physically present robot - Misty, as well as from video recordings of the same robot presented on a laptop screen. No statistically significant evidence supporting the superiority of either embodiment condition in terms of test performance was found. The post-test questionnaires revealed that the participants were more likely to perform better on the task if they also scored the robot as more likable. Additionally, differences in safety perception emerged, suggesting that the physically embodied robot led to feelings of unsafety relative to the one presented on a screen.
''Names (id):'' Margot Dijkstra (1893793), Llywelyn Vrouenraets (1879790), Sem Schreurs (1809539), Vladis Michail (1792814), Alessia-Maria Postelnicu (1839330), Sebastian Ciulacu (1886711)
''Supervisors'':  Elena Torta (e.torta@tue.nl), Raymond Cuijpers (r.h.cuijpers@tue.nl) , Mel Sexton (m.sexton@tue.nl)
===<big><u>Introduction</u></big>===
====Problem statement and objectives:  ====
It is no overseen issue that there is a shortage of teachers within education. School districts are struggling to find certified teachers, especially in world languages (Hanford, 2017<ref>Hanford, E. (2017). Schools in poor, rural districts are the hardest hit by nation’s growing teacher shortage. ''APM Reports''. Retrieved from https://www.apmreports.org/story/2017/08/28/rural-schools-teacher-shortage </ref>; Koerting, 2017<ref>Koerting, K. (2017). Schools confront shortage of world language teachers. News Times. Retrieved from http://www.newstimes.com/local/article/Schools-confront-shortage-of-world-language-10996278.php </ref>; Motoko, 2015<ref>Motoko, R. (2015). Teacher Shortages Spur a Nationwide Hiring Scramble (Credentials Optional). New York Times. Retrieved from: http://www.nytimes.com/2015/08/10/us/teacher-shortages-spur-a-nationwide-hiring-scramble-credentials-optional.html?_r=0 </ref>). This is a problem that the Technical University of Eindhoven faces as well. A search on Osiris revealed that there is more demand for Dutch courses than the TU/e is currently providing. Additionally, the course registration page shows that most timeslots are full early in the quartile, preceding the one in which teaching activities are scheduled. Furthermore, students taking Dutch in the third quartile received an email in the first week of the course urging them to already sign up for the follow-up course in the fourth quartile due to a "very high demand". As a result, international students who want to make use of the Dutch classes provided by the university are disappointed time and time again as the demand simply cannot be met. There are way too many students, and yet there is a huge shortage of opportunities for them to take the Dutch classes. This also means that if they manage to get into such a class, they are usually overflowing with students, often offering low one-to-one time with the tutor. Those who do not, however, end up being put on a waitlist with no clear indication of whether they will be able to successfully sign up for a language course in the desired quartile.
To tackle this issue, we looked towards HTI and robotics. We believe robotics can be a good tool to handle the aforementioned problem as it can be used to assist tutors in their teaching. With robotic assistance, tutors can divide their attention while the robot takes over certain tasks, increasing the opportunities for quality learning. We aim to help increase the opportunity for students to attend foreign language classes, even in times of high demand by students and a low supply of tutors for such classes.
====Who are the users?====
University students often face high cognitive loads throughout their studies, meaning that it is crucial to support them by providing efficient learning and retention strategies. Cognitive load theory suggests that when a student’s mental load becomes really high, it may overwhelm their cognitive capacity, negatively influencing effective learning and information retention (Sweller, 1988)<ref>Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. ''Cognitive Science, 12'', 257-285.</ref>. In our problem statement, we mention international students at the TU/e who want to study Dutch, but this concept can be applied to a wide audience, consisting of all universities with a large number of foreign students. This means that the main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it, is more general. This would be any university student trying to study a foreign language other than their language of study.
====What do they require?====
The university students require opportunities to learn the language of the country where they are staying. We believe that giving these students the opportunity to learn (predominantly vocabulary) through a robotic tutor will increase the capacity of the university to meet this demand. In this way, more students would be able to learn the local language, aiding their integration into the culture.  
The main requirements of the users that we aim to cater to in this research are the provision of supervision and smaller groups. Supervision has been found to be very influential, as it increases the general effectiveness of learning (Holloway, 1992)<ref>Holloway, E. L. (1992). Supervision: A way of teaching and learning. In S. D. Brown & R. W. Lent (Eds.), ''Handbook of counseling psychology'' (2nd ed., pp. 177–214). John Wiley & Sons.</ref>. It makes way for structured and repeated learning, giving the students a clear way of doing things, as well as a clear timeslot to do it in. This is why we expect robotic tutors to be more effective than language-learning apps, the likes of Duolingo, which provide supervision by means of mobile notifications, rather than physical presence. Jones (2007)<ref>Jones, R. W. (2007). Learning and Teaching in Small Groups: Characteristics, Benefits, Problems and Approaches. ''Anaesthesia and Intensive Care'', ''35''(4), 587–592. https://doi.org/10.1177/0310057x0703500420 </ref> promotes smaller group sizes since it promotes active participation, and deep rather than surface learning. One of the problems, however, is that it states that small groups typically require greater investment in key resources such as staff and meeting rooms.  
==== State of the Art ====
In recent years, advancements in robotics have led to the development of social, language learning robots designed to enhance the way people learn new languages. Currently, the most prominent state-of-the-art robots of this kind focus on children and school-aged children as the main user base. Among the most advanced in this field are EMYS, Elias, and SkoBots, each offering unique approaches to interactive language education.
EMYS is an expressive robot specifically designed for young, mainly bilingual, language learners, typically aged 3 to 7. It has a mechanized face with three moving discs, somewhat resembling a turtle, that allow it to display emotions. The robot comes with a set of cards that children can use, pictures of animals and such, to expand their vocabulary (''EMYS Robot + 3 Sets'', 2025)<ref>''EMYS robot + 3 sets''. (2025). EMYS. <nowiki>https://www.emys.co/product-page/emys-robot-3-sets</nowiki></ref>.
[[File:EMYS.png|center|thumb|382x382px|Figure 1: EMYS. <ref>https://www.emys.co/product-page/emys-robot-3-sets</ref>]]
Elias Robot, on the other hand, is designed primarily for classroom environments and provides AI-powered conversational practice in multiple languages. Its build is essentially that of a Nao robot. It integrates itself with educational systems and tablets, allowing teachers to customize lessons and track student progress via a specialized app, in which educators can make the necessary choices. It appears to have more functions than EMYS as it can complete more language-learning oriented tasks, such as pronouncing out loud a specific sentence, and also has a range of physical movements that it can do, such as dancing. Still, it appears that the EMYS robot is more emotionally expressive, while Elias is more bodily expressive (''Elias Robot | Elias Robot'', 2022)<ref>''Elias Robot | Elias Robot''. (2022). Elias Robot. <nowiki>https://www.eliasrobot.com/elias-robot-app</nowiki></ref>.
[[File:Elias Robot.png|center|thumb|685x685px|Figure 2: Elias robot.<ref>https://www.eliasrobot.com/elias-robot-app</ref>]]
SkoBots represent a more niche, yet more exciting venture in language learning robots than the previous two projects. It is a 3D-printed companion that can sit on the user’s shoulder and assist its mainly Native American user base with learning a Native American language. This way, SkoBots cater to a wider range of users, including older students and independent learners. Most of the sample videos show the robots assisting Native American teens with strengthening their vocabulary and providing dynamic, real-time corrections. One of the admirable aspects of this robot is its unique design, ease of assembly, and a low price point, making it accessible to marginalized communities (''SkoBots Language Learning'', n.d.)<ref>''SkoBots Language Learning''. (n.d.). The STEAM Connection. <nowiki>https://www.steamconnection.org/skobots</nowiki></ref>.
[[File:Skobots.png|center|thumb|452x452px|Figure 3: SkoBots.<ref>https://www.carasantamaria.com/podcast/danielle-boyer</ref>]]
==== Literature Background ====
The main mediating factors for the success of learning with an artificial tutor appear to be the amount of cognitive load experienced by the learner and the perceived social presence of the tutoring agent. Virtual humans could add additional processing to the environment in terms of visual or audio distraction, thus increasing cognitive load (Craig & Schroeder, 2017)<ref>Craig, S. D., & Schroeder, N. L. (2017). Reconsidering the voice effect when learning from a virtual human. ''Computers & Education'', ''114'', 193–205. <nowiki>https://doi.org/10.1016/j.compedu.2017.07.003</nowiki></ref>. This is supported by studies like Wainer et al. (2006)<ref name=":2">Wainer, J., Feil-seifer, D., Shell, D., & Mataric, M. (2006). The role of physical embodiment in human-robot interaction. ''ROMAN 2006 - the 15th IEEE International Symposium on Robot and Human Interactive Communication''. <nowiki>https://doi.org/10.1109/roman.2006.314404</nowiki></ref> and Seeger et al. (2018)<ref name=":3">Seeger, A.-M., Pfeiffer, J., & Heinzl, A. (2018). ''Designing Anthropomorphic Conversational Agents: Development and Empirical Evalua- tion of a Design Framework Completed Research Paper''. <nowiki>https://web.archive.org/web/20220802070748id_/https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1103&context=icis2018</nowiki></ref>, which show that physical embodiment enhances social presence and perception. Furthermore, increases in social presence have been positively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002)<ref>Tu, C.-H., & McIsaac, M. (2002). The Relationship of Social Presence and Interaction in Online Classes. ''American Journal of Distance Education'', ''16''(3), 131–150. <nowiki>https://doi.org/10.1207/s15389286ajde1603_2</nowiki></ref>.
In previous research, the presence of pedagogical agents was found to increase learning outcomes against non-embodiment conditions, including static agents and no-agent conditions. Embodied agents, those that possess human-like characteristics such as facial expression, gestures, lip synchronization, and body sway, significantly increase retention scores (Davis et al., 2022)<ref>Davis, R. O., Park, T., & Vincent, J. (2022). A Meta-Analytic Review on Embodied Pedagogical Agent Design and Testing Formats. ''Journal of Educational Computing Research'', ''61''(1), 30-67. https://doi.org/10.1177/07356331221100556 </ref>. Further support comes from Mayer and DaPra (2012)<ref>Mayer, R. E., & DaPra, C. S. (2012). An embodiment effect in computer-based learning with animated pedagogical agents. ''Journal of Experimental Psychology: Applied, 18''(3), 239–252. https://doi.org/10.1037/a0028616 </ref>, who found that learners performed better on a transfer test when a human-voiced agent displayed human-like gestures, facial expression, eye gaze, and body movement than when the agent did not, yielding an embodiment effect. The participants in the study by Fiorini et al. (2024)<ref>Fiorini L, D'Onofrio G, Sorrentino A, Cornacchia Loizzo F, Russo S, Ciccone F, Giuliani F, Sancarlo D, Cavallo F, The Role of Coherent Robot Behavior and Embodiment in Emotion Perception and Recognition During Human-Robot Interaction: Experimental Study, JMIR Hum Factors 2024;11:e45494, URL: https://humanfactors.jmir.org/2024/1/e45494, DOI: 10.2196/45494 </ref> reported greater arousal and dominance when interacting with embodied robots compared to voice-only interfaces. Perhaps in the same vein, the study by Dennler et al. (2024)<ref>Dennler, N.S., Nikolaidis, S., & Matari'c, M. (2024). Singing the Body Electric: The Impact of Robot Embodiment on User Expectations. ''ArXiv, abs/2401.06977''.</ref> suggests that embodiment increases perceived capability, which may affect information retention. These findings therefore indicate that higher expectations may lead to increased engagement but also potential disappointment if unmet. In their meta-analysis, Ouyang and Xu (2024)<ref>Ouyang, F., Xu, W. The effects of educational robotics in STEM education: a multilevel meta-analysis. ''IJ STEM Ed'' 11, 7 (2024). https://doi.org/10.1186/s40594-024-00469-4 </ref> argue that instead of using robots to directly convey knowledge, instructors should utilize educational robotics to facilitate students’ learning experience and work as facilitators to provide guidance and support to students.
More specific to Robot-Assisted Language Learning (RALL), Van Den Berghe et al. (2018)<ref name=":0">Van Den Berghe, R., Verhagen, J., Oudgenoeg-Paz, O., Van Der Ven, S., & Leseman, P. (2018). Social Robots for Language Learning: A Review. ''Review of Educational Research'', ''89''(2), 259–295. <nowiki>https://doi.org/10.3102/0034654318821286</nowiki></ref> suggest in their review that children may be able to learn words from robotic agents equally well as from human teachers, or when receiving assistance from their peers. Zinina et al. (2022)<ref>Zinina, A., Kotov, A., Arinkin, N., & Zaidelman, L. (2023). Learning a foreign language vocabulary with a companion robot. ''Cognitive Systems Research'', ''77'', 110–114. <nowiki>https://doi.org/10.1016/j.cogsys.2022.10.007</nowiki></ref> studied university-aged linguistics students, who were asked to practice vocabulary learning in Latin (a language that was foreign to them) and were subsequently asked to evaluate their experience and the performance of the robot as an assistant tutor. They judged the robot to give a positive impression and reported increased motivation and desire to use robot-assisted learning in the future. The review of RALL for adults by Deng, Qi et al. (2024)<ref name=":1">Deng, Q., Fu, C., Ban, M., & Iio, T. (2024). A systematic review on robot-assisted language learning for adults. ''Frontiers in Psychology'', ''15''. <nowiki>https://doi.org/10.3389/fpsyg.2024.1471370</nowiki></ref> suggests that interacting with robots allows learners to speak in a low-stress environment, improving fluency and confidence. Additionally, it was found that being taught by a physical robot, in contrast to one on video, enhanced performance on a vocabulary test and improved engagement.  
==== Research Aim and Hypothesis ====
In our present research, we investigated the effect of two different levels of embodiment of an artificial tutor on the recall of novel vocabulary. Specifically, we compared a physical robot condition with videos of the same robot on a screen. This led us to formulate the following research question: Does the level of embodiment of the tutoring agent affect novel vocabulary retention? Based on previous research, we hypothesize that the physical robot tutor condition will result in higher novel vocabulary retention relative to a screen-based agent.  
===<big><u>Methods</u></big>===
==== Participants & design ====
The sample included 21 students of ages ranging from 18 to 26, with the average age being 21.1 years old. Of the participants, 7 were female and 1 participant identified as other. We chose a within<ins>-</ins>participant design to control for individual differences. One condition used a physical robot, and the other condition used a video recording of the same robot. An artificial language, Vimmi, was used, which was created for research purposes and does not resemble any existing language (M. Macedonia, 2010)<ref>Macedonia, Manuela & Mueller, Karsten & Friederici, Angela. (2010). Neural Correlates of High Performance in Foreign Language Vocabulary Learning. Mind, Brain, and Education. 4. 125 - 134. 10.1111/j.1751-228X.2010.01091.x. </ref>. Misty II was used as the tutor robot (Misty II | Misty Robotics, z.d.)<ref>''Misty II | Misty Robotics''. (z.d.). Misty Robotics. https://www.mistyrobotics.com/misty-ii</ref>. 10 participants received Misty as their tutor first, and 11 received a laptop with video recordings of Misty first. This was done to minimize potential order effects. To avoid confounding effects from being with a peer, we had each participant complete the task individually.
==== Procedure & setting ====
We deployed the Wizard of Oz method to control the robot/display from a different room. After being introduced to 5 Vimmi words by the artificial tutor, participants took an oral test where the tutor gave them a word in Vimmi and they had to answer with the correct English translation. As mentioned before, we used Misty for the embodied condition, as this robot is equipped with an LED screen allowing for facial expression with eyes and eyebrows, head movements, and arm movements. The robot is 35.5 cm tall, and its arm can only rotate 180 degrees. As the participants were located in another room, we used a microphone setup to make sure we were able to hear them from the other room in case something went wrong or the participants needed help. During the experiment and the questionnaires, the participant was alone in the room.
==== Measurements ====
After each condition, participants were asked to fill in a questionnaire containing the Cognitive Load Questionnaire (CLQ) (Paas, 1992)<ref>Paas, F. G. W. C. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. ''Journal of Educational Psychology, 84''(4), 429–434. https://doi.org/10.1037/0022-0663.84.4.429 </ref>, Robot Social Presence Questionnaire (RSPQ) (Chen et al., 2023)<ref>Chen, N., Liu, X., Zhai, Y. ''et al.'' Development and validation of a robot social presence measurement dimension scale. ''Sci Rep'' 13, 2911 (2023). https://doi.org/10.1038/s41598-023-28817-4 </ref>, and a modified Godspeed Questionnaire (GQ) (Bartneck, 2009<ref>Bartneck, C., Kulić, D., Croft, E. ''et al.'' Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. ''Int J of Soc Robotics'' 1, 71–81 (2009). https://doi.org/10.1007/s12369-008-0001-3 </ref>;C.M. Carpinella, 2017<ref>C. M. Carpinella, A. B. Wyman, M. A. Perez and S. J. Stroessner, "The Robotic Social Attributes Scale (RoSAS): Development and Validation," ''2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI'', Vienna, Austria, 2017, pp. 254-262.</ref>). The CLQ was used to see whether our participants experienced a difference in cognitive load based on which tutoring agent they received. We believe that the video recording of the robot may potentially cause extraneous cognitive load and, in turn, reduce the cognitive resources available to the learner to integrate information with existing knowledge structures. This is because virtually presented agents could add additional processing to the environment in terms of visual or audio distraction (Craig & Schroeder, 2017)<ref>Scotty D. Craig, Noah L. Schroeder, Reconsidering the voice effect when learning from a virtual human, Computers & Education, Volume 114, 2017, Pages 193-205, ISSN 0360-1315, https://doi.org/10.1016/j.compedu.2017.07.003 </ref>. The RSPQ was added as we are interested in seeing whether our participants judge the social presence of the robot tutor differently from the social presence of the on-screen agent. Decreases in social presence have been negatively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002)<ref>Tu, C. H., & McIsaac, M. (2002). The Relationship of Social Presence and Interaction in Online Classes. ''American Journal of Distance Education'', ''16''(3), 131–150. https://doi.org/10.1207/S15389286AJDE1603_2 </ref>. Thus, it is interesting for us to measure if physically embodying the tutor inside the robot will increase the judgment of social presence, which could then potentially increase the retention of information. Lastly, a modified GQ Series, RoSAS, was used. We chose to include this questionnaire to measure the general impression that the participants had of the robot across the dimensions of Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety. This will allow us to better understand whether the participants enjoyed the experience with either agent in the first place and whether there is a relationship between their judgment and performance on the learning task. To collect the survey data, we used lab.js.
==== Statistical analysis ====
For the statistical testing, we first used Cronbach's alpha to test the reliability of the GQ and the RSPQ. We conducted paired t-tests to examine whether scores differed between the two conditions and to see whether participants scored differently on the CLQ, the dimensions of the RSPQ, and GQ between the two conditions. Cohen’s d was used to determine the effect size of our findings. Finally, a linear regression was performed to determine whether any of the variables are a significant predictor for the score.
===<big><u>Results</u></big>===
We investigated whether there is a relationship between the embodiment of a robot tutor and vocabulary retention. The main statistical tool we used in this analysis was paired t-tests. All hypothesis testing was conducted with the statistical analysis tool StataBE18.
====Reliability of the Questionnaires====
======The Godspeed Questionnaire======
We conducted a Cronbach's Alpha analysis per dimension to assess the reliability of the questionnaire dimensions in our dataset. Our study included the Godspeed questionnaire, which has five dimensions (Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety), and the Social Presence questionnaire.
In the Godspeed questionnaire, the dimension Anthropomorphism had an acceptable reliability (α = 0.7587), with the item of Naturalness (high means more natural, while low means fake) showing the highest item-test correlation (0.8733), the lowest of this dimension being Lifelikeness (high means life-like and low represents artificial) with a item-test correlation of 0.7442. These results suggest that all the items contribute meaningfully to the dimension, therefore, no item removal was necessary.
While lifelikeness did not score high on the Anthropomorphism dimension, it had the strongest correlation (0.7791) on the Animacy dimension. The lowest correlated item in Animacy was aliveness (the higher, the more alive; and the lower, the more dead) with a correlation of 0.6720. The Animacy dimension has a moderate internal consistency, with a Cronbach’s Alpha at 0.6833. With the scope of this exploration in mind, we considered the dimension sufficiently reliable and we will retain it, but in future applications, we suggest that a potential improvement could be achieved through revision or replacement of some items.
The Likeability (α = 0.8804) and the Perceived Safety (α= 0.8171) displayed the highest reliability from the Godspeed questionnaire. All the items from these dimensions were retained, as removing any would reduce the alpha. For the Likability, the niceness displayed the highest correlation (0.8951), where the higher the score, the nicer the robot seemed, and the lower scores conveyed that the robot was perceived as awful. The Perceived Safety had the highest item-test correlation on the relaxation item (the higher, the more relaxed; the lower, the more anxious) and the lowest correlation on safety.
Perceived Intelligence had a low Cronbach’s Alpha, initially at 0.6078. The perceived responsibility (high means responsible, low means irresponsible) and sensibility (high represents sensible, while low scores perceived the robot as foolish) were removed due to their low scores on the item-test correlation. After omitting these two items, the alpha level of the Perceived Intelligence dimension improved to 0.7537.
======The Social Presence Questionnaire======
Chen et al. (2023) do not name the dimensions of the Social Presence Questionnaire; therefore in this exploration, we decided that the following names would be appropriate: Dimension 1-Perceived Engagement with Social Robots, Dimension 2- Robot Collaboration and Adaptability, Dimension 3- Emotional Influence of Robots, Dimension 4- Perceived Communication Ability and Dimension 5 remains unnamed due to the low number of items (only 2 items were identified in this dimension).
The first four dimensions have acceptable reliability. Perceived Engagement with Social Robots had an alpha of 0.7948, where all the items contributed positively, and no removals were necessary. Robot Collaboration and Adaptability has a borderline moderate reliability (α = 0.6564), but for this paper, this is considered acceptable, and all the items are retained. The Emotional Influence of Robots is slightly below the threshold with alpha = 0.6810, but again, all the items are retained for further analysis. Perceived Communication Ability has an acceptable reliability (α = 0.7266), and removing any items would cause a drop in the alpha, so all items are retained.
The dimension that we did not include in our analysis is the 5<sup>th</sup> one, as it only contains two items and has an extremely poor reliability (α = 0.0523).
After this reliability analysis, we proceeded to test the hypothesis formulated in the introduction.
====Hypothesis testing====
Paired t-tests showed there is no statistically significant difference in means between the two conditions (p = 0.086). However, the p-value showed that there is a trend (p < 0.10), suggesting that participants scored better in the screen condition. (see figures 4 and 5).
[[File:Mean Scores in Screen and Robot Condition.png|center|thumb|568x568px|Figure 4: Mean Scores in Screen and Robot Condition]]
[[File:Individual Performance across Conditions.png|center|thumb|617x617px|Figure 5: Individual Performance across Conditions]]
For the CLQ, no significant difference between the robot and the screen condition (p = 0.309). In our data, there was no significant correlation between the score and the cognitive load in the screen condition (p = 0.494) or the robot condition (p = 0.343).
From the GQ, the safety dimension was scored as significantly different between the two conditions (p = 0.007), indicating that the robot was perceived to be safer in the screen condition than in the robot condition. No significant difference was found for the other dimensions like animacy (p = 0.261), anthropomorphism (p = 0.156), likeability (0.677), or intelligence (p = 0.467). The effect sizes for each dimension can be found in Figure 6.
[[File:Cohen's d for each dimension (Godspeed).png|center|thumb|594x594px|Figure 6: Cohen's d for each dimension (Godspeed questionnaire)]]
A linear regression was also performed on both conditions, and likeability emerged as the only significant predictor of the score in the robot condition. (p=.004, effect size=1.63). There were no significant findings for the screen condition.
From the RSPQ, the perceived presence dimension was scored significantly different between the two conditions (p=.0002), indicating that the physical robot was perceived to be significantly more present than the screen version. The Emotional Atmosphere dimension was scored as significantly different between the two conditions as well (p=.008), suggesting that participants felt there was more of an emotional atmosphere with the physical robot than in the screen condition. The rest of the dimensions, like helpfulness & collaboration (p =.947), communication ability (p=.638), and distraction (p=.877), weren’t scored significantly different between the two dimensions. The effect sizes can be found in Figure 7.
[[File:Cohen's d for each dimension (Social presence).png|center|thumb|618x618px|Figure 7: Cohen's d for each dimension (Social Presence questionnaire)]]
===<big><u>Discussion</u></big> ===
====Main findings====
Given the trend indicating that students, on average, performed better with the screen version of Misty, a screen-based approach could be used for the time being​. Van Den Berghe et al. (2018)<ref name=":0" /> suggest that children may be able to learn words from robotic agents as well as from human teachers. However, the review of RALL for adults by Deng, Qi et al. (2024)<ref name=":1" /> suggests that interacting with robots allows learners to speak in a low-stress environment, improving fluency and confidence. Additionally, it was found that being taught by a physical robot, in contrast to one on video, enhanced performance on a vocabulary test and improved engagement. Our results fall more in line with the findings by Van Den Berghe et al. (2018)<ref name=":0" /> as the difference in scores between the two conditions was not significant. Our findings contradict Craig & Schroeder (2017), who suggested that virtual humans, the videos of Misty in our case, may add cognitive load, as we found no differences between the two conditions in the cognitive load scores. They do, however, complement the studies by Wainer et al. (2006)<ref name=":2" /> and Seeger et al. (2018)<ref name=":3" />, as Misty was judged to be more socially present in the robot condition than in the screen condition. In our analysis, we further found that a physical robot was perceived as unsafe with a moderate effect size (Cohen’s d = 0.66), which contradicts the ideas about robots allowing for a safe environment presented in Deng, Qi et al. (2024)<ref name=":1" />. Furthermore, likeability was found to be a significant predictor of score with a physical robot, with the score increasing by 1.63 points for every increase of 1 point of likeability on the five-point Likert scale. 
====Potential limitations====
There were multiple limitations that must be addressed. With a total of 21 participants, our sample size was smaller than ideal for detecting subtle effects. Previous research indicates Embodiment increases test performance (Deng et al. 2024)<ref name=":1" />, which contradicts our findings. Several limiting factors could contribute to this. Most studies that used RALL did not use Misty as the tutor robot, which is a rather small robot that has a visible camera, which was informally criticized by multiple participants. The Wizard of Oz method also may have caused some delays since a human has to control the robot in real time, whereas for the video, the researcher only had to press one button to go to the next section. Additionally, the number of words was limited to just five in both conditions, this could mean that we may have tested participants' working memory rather than vocabulary retention. Future research should aim to address these constraints by increasing sample size and the number of words used, as well as automating robot control to improve on our shortcomings.
====Implications of the study====
Given that there were no significant differences in scores, a screen-based approach could be used for language learning without the presence of a physical tutor for the time being. No differences between our two conditions in terms of score suggest that students may learn equally well from a physically present robot and a recording of an artificial agent. People felt safer when shown videos of the robot than when the robot was physically present. Subsequently, we recommend future research in this direction to determine what specific factors contribute to people feeling unsafe. Furthermore, people who performed better on the robot condition of the learning task thought that the robot was more likable. We therefore also recommend future research in this direction to determine which factors make an individual like a robot more, which will then hopefully better their learning.
As of now, we can’t say that robot tutors can be efficiently used in language teaching settings, especially since online platforms for classes are possibly more convenient and less costly. However, additional research into the topic of RALL in adults and what causes robots to be good or bad tutors, as well as how to design them accordingly, may bring down some of the stress on the language learning tutors that the TU/e and many other places face.
===<big><u>Appendix</u></big> ===
====Appendix A: Logbook====
[https://tuenl-my.sharepoint.com/:x:/g/personal/v_michail_student_tue_nl/ESX_K6KzkbdCi65KJhRhZZsBclC4tfsQ2a7p3e0g6pCuIQ?e=aJpRHP logbook]
====Appendix B: Materials used for experiment:====
[https://drive.google.com/drive/folders/1taLsWeYRSYUWRDW4e923FvDFmaJDgQvA?usp=sharing Files]
==== Appendix C: Weekly updates====
=====Week 6 updates=====
Report progress:   
Report progress:   


Week 5:   
''Research script: contains the research script we used for each participant in order to standardize the instructions.'' 
 
''Experimenter instructions: contains the instructions for the experimenters conducting the procedure. The second page of the document also has direct links to the canva files we created to be used during the "screen" condition.''  


We started off the week by adding the final touches to the questionnaire, preparing the participants schedule, writing the researcher script and finalizing the programming of Misty. We also filmed the robot for the on-screen condition.   
''Experimental code: contains the scripts used for Misty, with the corresponding blocks either disabled or enabled.''  


Week 4:  
''Questionnaire: the zip file contains the html to the questionnaire we created in Labjs. Note that for the link to work, it must be in the same folder as the rest of the documents in the zip file due to dependencies.'' 


''Videos: contains the unedited videos we filmed of Misty. These were then placed in the canva slides you can find in the experimenter instructions document.''
=====Week 5 updates=====
We started off the week by adding the final touches to the questionnaire, preparing the participants schedule, writing the researcher script and finalizing the programming of Misty. We also filmed the robot for the on-screen condition. We then ran the experiment according to the schedule found in week 4 
=====Week 4 updates=====
We experimented with the robot to see which words are going to be used. All the group members got acquainted with Misty and learned the procedure of how to operate the robot. We reserved the rooms for the experiment, the sessions will take place on the 8<sup>th</sup> floor of Atlas building on 18<sup>th</sup>, 19<sup>th</sup>, 20<sup>th</sup> and 21<sup>st</sup> of March (during working hours). Details of the experiment have been settled like using 5 words per conditions and the randomization of words per conditions. We realized that we should charge the robot in advance for the experiment and included that in our planning. 15 participants were already contacted to express their availability. More people will be contacted during the weekend.  
We experimented with the robot to see which words are going to be used. All the group members got acquainted with Misty and learned the procedure of how to operate the robot. We reserved the rooms for the experiment, the sessions will take place on the 8<sup>th</sup> floor of Atlas building on 18<sup>th</sup>, 19<sup>th</sup>, 20<sup>th</sup> and 21<sup>st</sup> of March (during working hours). Details of the experiment have been settled like using 5 words per conditions and the randomization of words per conditions. We realized that we should charge the robot in advance for the experiment and included that in our planning. 15 participants were already contacted to express their availability. More people will be contacted during the weekend.  


Line 16: Line 152:
|Monday
|Monday
|Tuesday
|Tuesday
|Wednesday
| Wednesday
|Thursday
|Thursday
|Friday
|Friday
Line 24: Line 160:


|9:00-17:00 Running the experiments
|9:00-17:00 Running the experiments
|9:00-17:00 Running the experiments
|9:00-17:00 Running the experiments  


|9:00-17:00 Running the experiments
|9:00-17:00 Running the experiments
Line 41: Line 177:
Upcoming meeting agenda: [https://tuenl-my.sharepoint.com/:w:/g/personal/v_michail_student_tue_nl/Eb4DTnyW57ZMrcjQUGehMvwB-j750dDjclVTweoopMku7g?e=Iziicl Meeting agenda Monday 2025-02-24.docx]
Upcoming meeting agenda: [https://tuenl-my.sharepoint.com/:w:/g/personal/v_michail_student_tue_nl/Eb4DTnyW57ZMrcjQUGehMvwB-j750dDjclVTweoopMku7g?e=Iziicl Meeting agenda Monday 2025-02-24.docx]


== Robot-assisted vocabulary learning ==
=====Week 3 Updates=====
 
=== Abstract ===
 
=== Introduction ===
 
==== Problem statement and objectives:   ====
It is no overseen issue that there is a shortage of teachers within education. School districts are struggling to find certified teachers, especially in world languages (Hanford, 2017<ref>Hanford, E. (2017). Schools in poor, rural districts are the hardest hit by nation’s growing teacher shortage. ''APM Reports''. Retrieved from https://www.apmreports.org/story/2017/08/28/rural-schools-teacher-shortage </ref>; Koerting, 2017<ref>Koerting, K. (2017). Schools confront shortage of world language teachers. News Times. Retrieved from http://www.newstimes.com/local/article/Schools-confront-shortage-of-world-language-10996278.php </ref>; Motoko, 2015<ref>Motoko, R. (2015). Teacher Shortages Spur a Nationwide Hiring Scramble (Credentials Optional). New York Times. Retrieved from: http://www.nytimes.com/2015/08/10/us/teacher-shortages-spur-a-nationwide-hiring-scramble-credentials-optional.html?_r=0 </ref>). This is a problem that the Technical University of Eindhoven faces as well. After a small investigation, the following came to light: International students who want to make use of the Dutch classes provided by the university are disappointed time and time again as the demand simply cannot be met. There are way too many students, and yet there is a huge shortage of opportunities for them to take the Dutch classes. This also means that if they manage to get into such a class, they are usually overfilled with students, often offering low one-to-one time with the tutor. Those who do not, however, end up being put on a waitlist with no clear indication of whether they will be able to successfully sign up for a language course in the desired quartile.
 
To tackle this issue, we looked towards HTI and robotics. We believe robotics can be a good tool to handle this aforementioned problem as it can be used to assist tutors in their teaching. With robotic assistance, tutors can divide their attention while the robot takes over certain tasks, increasing the opportunities for quality learning. We aim to help increase the opportunity for students to attend foreign language classes, even in times with high demand for such classes and a low supply of tutors.
 
==== Who are the users? ====
University students often face high cognitive loads throughout their studies, meaning that it is crucial to support them by providing efficient learning and retention strategies. In our problem statement, we mention international students at the TU/e who want to study Dutch, but this concept can be applied to a wide audience, consisting of all universities with a large number of foreign students. This means that the main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it is more general. This would be any university student trying to study a foreign language other than their language of study.
 
==== What do they require? ====
The university students require opportunities to learn the language of the country that they are staying in. We believe that giving these students the opportunity to learn (predominantly vocabulary) through a robot, will increase the capacity of the university to meet this demand. In this way, more students would be able to learn the local language, aiding their integration into the culture.  
 
The main requirements of the users that we aim to cater to in this research are to provide them with supervision and smaller groups. Supervision has been found to be very influential, as it increases the general effectiveness of learning (Holloway, 1992)<ref>Holloway, E. L. (1992). Supervision: A way of teaching and learning. In S. D. Brown & R. W. Lent (Eds.), ''Handbook of counseling psychology'' (2nd ed., pp. 177–214). John Wiley & Sons.</ref>. It makes way for structured and repeated learning, giving the students a clear way of doing things, as well as a clear timeslot to do it in. This is why we expect robotic tutors to be more effective than language-learning apps, the like of Duolingo, which provide supervision by means of mobile notifications, rather than physical presence. Jones (2007) promotes smaller group sizes since it promotes active participation, and deep rather than surface learning. One of the problems, however, is that it states that small groups typically require greater investment in key resources such as staff and meeting rooms.  
 
The presence of pedagogical agents was found to increase learning outcomes against no-embodiment conditions (static agents and/or no-agent conditions) and no-agent conditions. Embodied agents, those that possess human-like characteristics such as facial expression, gestures, lip synchronization, and body sway significantly increase retention scores (Davis et al., 2022)<ref>Davis, R. O., Park, T., & Vincent, J. (2022). A Meta-Analytic Review on Embodied Pedagogical Agent Design and Testing Formats. ''Journal of Educational Computing Research'', ''61''(1), 30-67. https://doi.org/10.1177/07356331221100556 </ref>. Further support comes from Mayer and DaPra (2012)<ref>Mayer, R. E., & DaPra, C. S. (2012). An embodiment effect in computer-based learning with animated pedagogical agents. ''Journal of Experimental Psychology: Applied, 18''(3), 239–252. <nowiki>https://doi.org/10.1037/a0028616</nowiki></ref>, who found that learners performed better on a transfer test when a human-voiced agent displayed human-like gestures, facial expression, eye gaze, and body movement than when the agent did not, yielding an embodiment effect. The participants in the study by Fiorini et al. (2024)<ref>Fiorini L, D'Onofrio G, Sorrentino A, Cornacchia Loizzo F, Russo S, Ciccone F, Giuliani F, Sancarlo D, Cavallo F, The Role of Coherent Robot Behavior and Embodiment in Emotion Perception and Recognition During Human-Robot Interaction: Experimental Study, JMIR Hum Factors 2024;11:e45494, URL: https://humanfactors.jmir.org/2024/1/e45494, DOI: 10.2196/45494 </ref> reported greater arousal and dominance when interacting with embodied robots compared to voice-only interfaces. Perhaps in the same vein, the study by Dennler et al. (2024)<ref>Dennler, N.S., Nikolaidis, S., & Matari'c, M. (2024). Singing the Body Electric: The Impact of Robot Embodiment on User Expectations. ''ArXiv, abs/2401.06977''.</ref> suggests that embodiment increases perceived capability, which may affect information retention. These findings therefore indicate that higher expectations may lead to increased engagement but also potential disappointment if unmet. In their meta-analysis, Ouyang and Xu (2024)<ref>Ouyang, F., Xu, W. The effects of educational robotics in STEM education: a multilevel meta-analysis. ''IJ STEM Ed'' 11, 7 (2024). https://doi.org/10.1186/s40594-024-00469-4 </ref> argue that instead of using robots to directly convey knowledge, instructors should utilize educational robotics to facilitate students’ learning experience and work as facilitators to provide guidance and support students.
 
=== Methods ===
The sample included 21 students of ages ranging from 18 to 26, with the average age for being 21.1 years old. Of the participants, 7 were female and 1 participant identified as Other. We chose a within<ins>-</ins>participant <s>des</s>ign to increase the power of the study. One condition used a physical robot and the other condition used a video recording of the same robot. An artificial corpus, Vimmi, was used, which was created for research purposes and does not resemble any existing language. (M. Macedonia, 2010)<ref>Macedonia, Manuela & Mueller, Karsten & Friederici, Angela. (2010). Neural Correlates of High Performance in Foreign Language Vocabulary Learning. Mind, Brain, and Education. 4. 125 - 134. 10.1111/j.1751-228X.2010.01091.x. </ref>
 
For the robot tutor, we used Misty, as that is a robot with an LED screen, which allows for facial expression. 10 participants received Misty as their tutor first, and 11 received a laptop with video recordings of Misty first. This was done to minimize potential order effects. We deployed the Wizard of Oz method to control the robot/display from a different room. To avoid confounding effects from being with a peer, we had each participant complete the task individually. After being introduced to 5 Vimmi words by the artificial tutor, participants took an oral test where the tutor gave them a word in Vimmi and they had to answer with the correct English translation.
 
After each condition, participants were asked to fill in a questionnaire containing the Cognitive Load Questionnaire (CLQ) (Paas, 1992)<ref>Paas, F. G. W. C. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. ''Journal of Educational Psychology, 84''(4), 429–434. https://doi.org/10.1037/0022-0663.84.4.429 </ref>, Robot Social Presence Questionnaire (RSPQ) (Chen et al., 2023)<ref>Chen, N., Liu, X., Zhai, Y. ''et al.'' Development and validation of a robot social presence measurement dimension scale. ''Sci Rep'' 13, 2911 (2023). https://doi.org/10.1038/s41598-023-28817-4 </ref>, and a modified Godspeed Questionnaire (GQ) (Bartneck, 2009<ref>Bartneck, C., Kulić, D., Croft, E. ''et al.'' Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. ''Int J of Soc Robotics'' 1, 71–81 (2009). https://doi.org/10.1007/s12369-008-0001-3 </ref>;C.M. Carpinella, 2017<ref>C. M. Carpinella, A. B. Wyman, M. A. Perez and S. J. Stroessner, "The Robotic Social Attributes Scale (RoSAS): Development and Validation," ''2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI'', Vienna, Austria, 2017, pp. 254-262.</ref>). The CLQ was used to see whether our participants experienced a difference in cognitive load based on which tutoring agent they received. We believe that the video recording of the robot may potentially cause extraneous cognitive load and in turn reduce the cognitive resources available to the learner to integrate information with existing knowledge structures. This is because virtually presented agents could add additional processing to the environment in terms of visual or audio distraction (Craig & Schroeder, 2017)<ref>Scotty D. Craig, Noah L. Schroeder, Reconsidering the voice effect when learning from a virtual human, Computers & Education, Volume 114, 2017, Pages 193-205, ISSN 0360-1315, https://doi.org/10.1016/j.compedu.2017.07.003 </ref>. The RSPQ was added as we are interested in seeing whether our participants judge the social presence of the robot tutor differently to the social presence of the on-screen agent. Decreases in social presence have been negatively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002)<ref>Tu, C. H., & McIsaac, M. (2002). The Relationship of Social Presence and Interaction in Online Classes. ''American Journal of Distance Education'', ''16''(3), 131–150. https://doi.org/10.1207/S15389286AJDE1603_2 </ref>. Thus, for us it is interesting to measure if physically embodying the tutor inside the robot will increase the judgement of social presence which could in turn potentially increase the retention of information. Lastly, a modified GQ Series, RoSAS  was used. We chose to include this questionnaire to measure the general impression that the participants had of the robot across the dimensions of Anthropomorphism, Animacy, Likeability, Perceived Intelligence and Perceived Safety. This will allow us to better understand whether the participants enjoyed the experience with either agent in the first place and whether there is a relationship between their judgment and performance on the learning task.
 
<nowiki>*</nowiki>add statistical tests used*
 
=== Results ===
 
=== discussion & conclusion ===
 
=== References ===
 
=== Appendix ===
 
== Week 3 Updates ==
'''Research Question:''' Does the level of embodiment of the tutoring agent affect novel vocabulary retention?  
'''Research Question:''' Does the level of embodiment of the tutoring agent affect novel vocabulary retention?  


Line 91: Line 189:
Between-participant design: Half of the participants will have Misty as their tutor, the other half will have a laptop with video of Misty. The two groups will be able to ask the tutor to perform the same tasks (repeat a word, spell it out) by telling it to do so. We plan on using a wizard of oz setup where we control the robot/ what is displayed on the screen from a different room. We propose that each participant completes the task individually to avoid confounding effects from being with a peer. After the dedicated learning time has passed, the participants will be asked to perform a short break task (to avoid recency bias). Following that, they will take an oral test where the tutor gives them the foreign word (we settled on using Vimmi) and they have to answer back with the correct English translation.  
Between-participant design: Half of the participants will have Misty as their tutor, the other half will have a laptop with video of Misty. The two groups will be able to ask the tutor to perform the same tasks (repeat a word, spell it out) by telling it to do so. We plan on using a wizard of oz setup where we control the robot/ what is displayed on the screen from a different room. We propose that each participant completes the task individually to avoid confounding effects from being with a peer. After the dedicated learning time has passed, the participants will be asked to perform a short break task (to avoid recency bias). Following that, they will take an oral test where the tutor gives them the foreign word (we settled on using Vimmi) and they have to answer back with the correct English translation.  


=== Questionnaires to be used after experimental trial ===
'''''<u>Questionnaires to be used after experimental trial</u>'''''
 
'''Cognitive Load Questionnaire''' (Paas, 1992) - (1 question). We would like to see whether our participants experienced a difference in cognitive load based on which tutoring agent they received. We believe that the video recording of the robot may potentially cause extraneous cognitive load and in turn reduce the cognitive resources available to learner to integrate information - in this case new vocabulary - with existing knowledge structures. This is because virtually presented agents could add additional processing to the environment in terms of visual or audio distraction (Craig & Schroeder, 2017). <blockquote>'''How much mental effort did you apply in completing the vocabulary learning task?'''
'''Cognitive Load Questionnaire''' (Paas, 1992) - (1 question). We would like to see whether our participants experienced a difference in cognitive load based on which tutoring agent they received. We believe that the video recording of the robot may potentially cause extraneous cognitive load and in turn reduce the cognitive resources available to learner to integrate information - in this case new vocabulary - with existing knowledge structures. This is because virtually presented agents could add additional processing to the environment in terms of visual or audio distraction (Craig & Schroeder, 2017). <blockquote>'''How much mental effort did you apply in completing the vocabulary learning task?'''


Line 98: Line 197:
9-point likert Scale (1 = very, very low mental effort; 2 = very low mental effort; 3 = low mental effort; 4 = rather low mental effort; 5 = neither low nor high mental effort; 6 = rather high mental effort; 7 = high mental effort/ 8 = very high mental effort; 9 = very, very high mental effort
9-point likert Scale (1 = very, very low mental effort; 2 = very low mental effort; 3 = low mental effort; 4 = rather low mental effort; 5 = neither low nor high mental effort; 6 = rather high mental effort; 7 = high mental effort/ 8 = very high mental effort; 9 = very, very high mental effort


Re-test Reliability= .90</blockquote>'''Robot Social Presence''' (Chen et al., 2024) -  (19 questions). We are interested in seeing whether our participants judge the social presence of the robot tutor differently to the social presence of the on-screen agent. Decreases in social presence have been negatively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002). Thus, for us it would be interesting to measure if physically embodying the tutor inside the robot will increase the judgement of social presence which could in turn potentially increase the retention of information.<blockquote>[[File:Robot Social Presence Scale (Chen et al., 2024).png|center|frameless|676x676px|Robot Social Presence Scale (Chen et al., 2024)]]</blockquote>'''Godspeed Questionnaire Series''' (Bartneck, 2008) - (24 items). We chose to include this questionnaire to measure the general impression that the participants had of the robot across the dimensions of Anthropomorphism, Animacy, Likeability, Perceived Intelligence and Perceived Safety. This will allow us to better understand whether the participants enjoyed the experience with either agent in the first place and whether there is a relationship between their judgement and performance on the learning task. <blockquote>[[File:Godspeed Questionnaire Series (Bartneck, 2008).png|center|frameless|Godspeed Questionnaire Series (Bartneck, 2008)]]</blockquote>
Re-test Reliability= .90</blockquote>'''Robot Social Presence''' (Chen et al., 2024) -  (19 questions). We are interested in seeing whether our participants judge the social presence of the robot tutor differently to the social presence of the on-screen agent. Decreases in social presence have been negatively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002). Thus, for us it would be interesting to measure if physically embodying the tutor inside the robot will increase the judgement of social presence which could in turn potentially increase the retention of information.<blockquote>[[File:Robot Social Presence Scale (Chen et al., 2024).png|center|frameless|676x676px|Robot Social Presence Scale (Chen et al., 2024)]]</blockquote>'''Godspeed Questionnaire Series''' (Bartneck, 2008) - (24 items). We chose to include this questionnaire to measure the general impression that the participants had of the robot across the dimensions of Anthropomorphism, Animacy, Likeability, Perceived Intelligence and Perceived Safety. This will allow us to better understand whether the participants enjoyed the experience with either agent in the first place and whether there is a relationship between their judgement and performance on the learning task. <blockquote>[[File:Godspeed Questionnaire Series (Bartneck, 2008).png|center|frameless|Godspeed Questionnaire Series (Bartneck, 2008)]]</blockquote><u>'''''Getting acquainted with the NAO robot'''''</u>


=== Getting acquainted with the NAO robot ===
A trial using the Choregraphe screen-based app was executed this week to get acquainted with the motions and the audio functions of the robot. The motions seem to be executed smoothly, but on the real-life robot, some mobility and equilibrium issues might be encountered. The audio did not function in the screen-based application, only a text bubble containing the assigned script wad displayed above the NAO robot, so the auditory aspect will be assessed on the physical robot.  
A trial using the Choregraphe screen-based app was executed this week to get acquainted with the motions and the audio functions of the robot. The motions seem to be executed smoothly, but on the real-life robot, some mobility and equilibrium issues might be encountered. The audio did not function in the screen-based application, only a text bubble containing the assigned script wad displayed above the NAO robot, so the auditory aspect will be assessed on the physical robot.  


== Week 2 Updates ==
===== Week 2 Updates =====
'''Problem statement and objectives:'''  


=== Problem statement and objectives:   ===
It is no overseen issue that there is a shortage of teachers within education. School districts are scrambling to find certified teachers, especially in world languages (Hanford, 2017; Koerting, 2017; Motoko, 2015). This is a problem that the Technical University of Eindhoven faces as well. After a small investigation, the following came to light: International students who want to make use of the Dutch classes provided by the university are disappointed time and time again since the demand simply cannot be met. There are way too many students, and yet there is a huge shortage of opportunities for them to take the Dutch classes they want to take. This also means that if they manage to get into such a class, they are usually overfilled with students, often offering low one-to-one time with the tutor. Those who do not, however, end up being put on a waitlist with no clear indication of whether they will be able to successfully sign up for a language course in the desired quartile.  
It is no overseen issue that there is a shortage of teachers within education. School districts are scrambling to find certified teachers, especially in world languages (Hanford, 2017; Koerting, 2017; Motoko, 2015). This is a problem that the Technical University of Eindhoven faces as well. After a small investigation, the following came to light: International students who want to make use of the Dutch classes provided by the university are disappointed time and time again since the demand simply cannot be met. There are way too many students, and yet there is a huge shortage of opportunities for them to take the Dutch classes they want to take. This also means that if they manage to get into such a class, they are usually overfilled with students, often offering low one-to-one time with the tutor. Those who do not, however, end up being put on a waitlist with no clear indication of whether they will be able to successfully sign up for a language course in the desired quartile.  


To tackle this issue, we will look towards HTI and robotics. We believe robotics can be a good tool to handle this aforementioned problem as it can be used to assist tutors in their teaching. With robotic assistance, tutors can divide their attention while the robot takes over certain tasks, increasing the opportunities for quality learning. The objectives will be to conduct research in the direction of providing students with a robotic tutor in hopes of seeing whether this could provide an adequate solution to this problem. We aim to help increase the opportunity for students to attend foreign language classes, even in times with high demand for such classes and a low supply of tutors.  
To tackle this issue, we will look towards HTI and robotics. We believe robotics can be a good tool to handle this aforementioned problem as it can be used to assist tutors in their teaching. With robotic assistance, tutors can divide their attention while the robot takes over certain tasks, increasing the opportunities for quality learning. The objectives will be to conduct research in the direction of providing students with a robotic tutor in hopes of seeing whether this could provide an adequate solution to this problem. We aim to help increase the opportunity for students to attend foreign language classes, even in times with high demand for such classes and a low supply of tutors.  


=== Who are the users? ===
'''Who are the users?'''
 
University students often face high cognitive loads throughout their studies, meaning that it is crucial to support them by providing efficient learning and retention strategies. In our problem statement, we mention international students at the TU/e who want to study Dutch, but this concept can be applied to a wide audience, consisting of all universities with a large number of foreign students. This means that the main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it is more general. This would be any university student trying to study a foreign language other than their language of study.  
University students often face high cognitive loads throughout their studies, meaning that it is crucial to support them by providing efficient learning and retention strategies. In our problem statement, we mention international students at the TU/e who want to study Dutch, but this concept can be applied to a wide audience, consisting of all universities with a large number of foreign students. This means that the main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it is more general. This would be any university student trying to study a foreign language other than their language of study.  


=== What do they require? ===
'''What do they require?'''
 
The university students require opportunities to learn the language of the country that they are staying in. We believe that giving these students the opportunity to learn (predominantly vocabulary) through a robot, will increase the capacity of the university to meet this demand. In this way, more students would be able to learn the local language, aiding their integration into the culture.  
The university students require opportunities to learn the language of the country that they are staying in. We believe that giving these students the opportunity to learn (predominantly vocabulary) through a robot, will increase the capacity of the university to meet this demand. In this way, more students would be able to learn the local language, aiding their integration into the culture.  


The main requirements of the users that we aim to cater to in this research are to provide them with supervision and smaller groups. Supervision has been found to be very influential, as it increases the general effectiveness of learning (Holloway, 1992). It makes way for structured and repeated learning, giving the students a clear way of doing things, as well as a clear timeslot to do it in. This is why we expect robotic tutors to be more effective than language-learning apps, the like of Duolingo, which provide supervision by means of mobile notifications, rather than physical presence. Jones (2007) promotes smaller group sizes since it promotes active participation, and deep rather than surface learning. One of the problems, however, is that it states that small groups typically require greater investment in key resources such as staff and meeting rooms.  
The main requirements of the users that we aim to cater to in this research are to provide them with supervision and smaller groups. Supervision has been found to be very influential, as it increases the general effectiveness of learning (Holloway, 1992). It makes way for structured and repeated learning, giving the students a clear way of doing things, as well as a clear timeslot to do it in. This is why we expect robotic tutors to be more effective than language-learning apps, the like of Duolingo, which provide supervision by means of mobile notifications, rather than physical presence. Jones (2007) promotes smaller group sizes since it promotes active participation, and deep rather than surface learning. One of the problems, however, is that it states that small groups typically require greater investment in key resources such as staff and meeting rooms.  


=== State of the art ===
'''State of the art'''
 
In recent years, advancements in robotics have led to the development of social, language learning robots designed to enhance the way people learn new languages. Currently, the most prominent state-of-the-art robots of this kind have children and school-aged children in mind as the main user base. Among the most advanced in this field are EMYS, Elias, and SKoBots, each offering unique approaches to interactive language education.
In recent years, advancements in robotics have led to the development of social, language learning robots designed to enhance the way people learn new languages. Currently, the most prominent state-of-the-art robots of this kind have children and school-aged children in mind as the main user base. Among the most advanced in this field are EMYS, Elias, and SKoBots, each offering unique approaches to interactive language education.


Line 136: Line 237:
Deng, Qi et al. (2024): in this review of Robots Assisted Language Learning (RALL) for '''adults''', the researchers suggest that interacting with robots allows learners to speak in a low-stress environment, improving fluency and confidence. Additionally, it was found that being taught by a physical robot, in contrast to one on video, enhanced performance on a vocabulary test and improved engagement. The study by Van den Berghe et al. (2021) used Brain-Computer Interfaces to provide adaptive feedback which the robot used to maintain the participant’s attention. However, there were some studies indicating that the introduction of social robots did not significantly improve learning outcomes. Lastly, using implicit teaching methods, such as conversational, could improve grammar and some studies found that it also improves pronunciation.
Deng, Qi et al. (2024): in this review of Robots Assisted Language Learning (RALL) for '''adults''', the researchers suggest that interacting with robots allows learners to speak in a low-stress environment, improving fluency and confidence. Additionally, it was found that being taught by a physical robot, in contrast to one on video, enhanced performance on a vocabulary test and improved engagement. The study by Van den Berghe et al. (2021) used Brain-Computer Interfaces to provide adaptive feedback which the robot used to maintain the participant’s attention. However, there were some studies indicating that the introduction of social robots did not significantly improve learning outcomes. Lastly, using implicit teaching methods, such as conversational, could improve grammar and some studies found that it also improves pronunciation.


== Week 1 Updates ==
===== Week 1 Updates=====
'''Problem statement and objectives:'''  


=== Problem statement and objectives:   ===
The nature of the world around us is everchanging, technology evolves exponentially and developments in computational power reshape the reality around us, therefore it is essential to understand how these changes affect us and how we can develop technology that contribute to flourishing of the human species. Scenarios that seemed to belong to Science Fiction novels started to be implemented among robotic technology developers. Understaffed fields that are less appealing to the broad public seem to benefit from the attention of these developers, applications such as care robots, educational assistant robots, and factory robots are popular topics of robotics enthusiasts. An aspect that is frequently overlooked lays in the depths of their interaction with people, where characteristics that are intrinsically humane, like social norms, trust, meaning, culture and emotions play a central role. Learning and education are a significant aspect of the human experience and contributes to the development of our species, therefore this study will focus on Human-Robot Interactions (HRI) in the context of education, with an emphasis on the way information is delivered to the human receivers. Being a good educator is complex and involves many underlying characteristics, therefore this exploration will tackle a superficial layer of the depths of what it takes to be a teacher, namely we will look at what embodiment of an agent that delivers a material provides the best recall of the content.  
The nature of the world around us is everchanging, technology evolves exponentially and developments in computational power reshape the reality around us, therefore it is essential to understand how these changes affect us and how we can develop technology that contribute to flourishing of the human species. Scenarios that seemed to belong to Science Fiction novels started to be implemented among robotic technology developers. Understaffed fields that are less appealing to the broad public seem to benefit from the attention of these developers, applications such as care robots, educational assistant robots, and factory robots are popular topics of robotics enthusiasts. An aspect that is frequently overlooked lays in the depths of their interaction with people, where characteristics that are intrinsically humane, like social norms, trust, meaning, culture and emotions play a central role. Learning and education are a significant aspect of the human experience and contributes to the development of our species, therefore this study will focus on Human-Robot Interactions (HRI) in the context of education, with an emphasis on the way information is delivered to the human receivers. Being a good educator is complex and involves many underlying characteristics, therefore this exploration will tackle a superficial layer of the depths of what it takes to be a teacher, namely we will look at what embodiment of an agent that delivers a material provides the best recall of the content.  


=== Who are the users? ===
'''Who are the users?'''
 
The main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it, is university students, as they are the primary subgroup of students gradually encountering more robot-assisted learning environments (e.g., Pepper being used as a teaching assistant at Carnegie Mellon University).  University students often face high cognitive loads throughout their studies; therefore, it is crucial to support them by providing efficient learning and retention strategies. In the case of a robot teaching a student, such a strategy would be to assign the most suitable type of agent for the learning context.  
The main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it, is university students, as they are the primary subgroup of students gradually encountering more robot-assisted learning environments (e.g., Pepper being used as a teaching assistant at Carnegie Mellon University).  University students often face high cognitive loads throughout their studies; therefore, it is crucial to support them by providing efficient learning and retention strategies. In the case of a robot teaching a student, such a strategy would be to assign the most suitable type of agent for the learning context.  


=== What do they require? ===
'''What do they require?'''
 
For a university student it is important to have access to effective and adaptive learning strategies that can help them manage the high cognitive loads and improve their retention of information. In the context of robot-assisted education, this means that the robot that works with the student needs to maintain a high level of engagement while facilitating understanding and optimizing recall. The content and context of the environment in which the robot and student are based need to be taken into account as different situations will require different types of agents. Characteristics such as interaction style, verbal and non-verbal communication, and adaptability need to be clearly determined to best support learning. Additionally, students can benefit from personalized and interactive learning, in which case robots can adjust their approach based on the individual's learning needs.  
For a university student it is important to have access to effective and adaptive learning strategies that can help them manage the high cognitive loads and improve their retention of information. In the context of robot-assisted education, this means that the robot that works with the student needs to maintain a high level of engagement while facilitating understanding and optimizing recall. The content and context of the environment in which the robot and student are based need to be taken into account as different situations will require different types of agents. Characteristics such as interaction style, verbal and non-verbal communication, and adaptability need to be clearly determined to best support learning. Additionally, students can benefit from personalized and interactive learning, in which case robots can adjust their approach based on the individual's learning needs.  


==== Useful vocabulary and theories: ====
'''Useful vocabulary and theories:'''
 
The ''voice'' is an expressive aural medium of communication, it can be viewed as the "how" of vocalizations. Nonverbal paralinguistic properties that characterize the voice, such as tone, loudness, pitch and, timbre, are called ''vocalis.'' ''Speech'' is the linguistic content of the voice, primarily consisting of words, grammar, syntax and phonetics (Seaborn et al., 2021).
The ''voice'' is an expressive aural medium of communication, it can be viewed as the "how" of vocalizations. Nonverbal paralinguistic properties that characterize the voice, such as tone, loudness, pitch and, timbre, are called ''vocalis.'' ''Speech'' is the linguistic content of the voice, primarily consisting of words, grammar, syntax and phonetics (Seaborn et al., 2021).


The voice Effect assumes that people learn better when they are exposed to multimedia instruction that includes a human voice rather than a machine-synthesized one (Craig & Schroeder, 2017). Recorded human voices provide an experience that is easier to identify as a social interaction, thus promoting the active learning process. This can be explained using cognitive load as machine voices may cause extraneous cognitive load and reduce cognitive resources available to integrate information with existing knowledge structures. The cognitive load could also be increased because could add additional processing to the environment in terms of visual or audio distraction.
The voice Effect assumes that people learn better when they are exposed to multimedia instruction that includes a human voice rather than a machine-synthesized one (Craig & Schroeder, 2017). Recorded human voices provide an experience that is easier to identify as a social interaction, thus promoting the active learning process. This can be explained using cognitive load as machine voices may cause extraneous cognitive load and reduce cognitive resources available to integrate information with existing knowledge structures. The cognitive load could also be increased because could add additional processing to the environment in terms of visual or audio distraction.


(-       Voice: an expressive aural medium of communication. Is the “how” of vocalizations (Seaborn et al., 2021).
'''Auditory Encoding and Short-Term Recall'''


-       Vocalics: nonverbal paralinguistic properties – tone, loudness, pitch, timbre and nonverbal prosodic properties – rhythm, intonation and stress. They characterise the voice (Seaborn et al., 2021).
The study by Colle (1980) supports the central masking hypothesis, suggesting that auditory noise interferes with visual recall because the speech loop must pass through the preperceptual auditory store, where it gets masked by noise. This aligns with the idea that AI-generated speech, with its inconsistent flow and unnatural pauses, could function as a form of "structured noise," disrupting inner dialogue and reducing recall ability.  


-       Speech: linguistic content of voice, primarily comprising words, grammar and syntax, and phonetics. Is the “what” of vocalizations (Seaborn et al., 2021).
'''Topic Interest and Incidental Learning'''


-      Voice Effect: Assumes that people learn better when they are exposed to multimedia instruction that includes a human voice rather than a machine voice (Dincer 2022).
Cancino’s (2019) research highlights how topic interest significantly influences vocabulary retention in incidental learning settings. This effect is mediated by cognitive processing depth and dictionary use.  


Perspectives on why learning with a recorded human voice may be more effective than learning from a machine-synthesized one (Craig & Schroeder, 2017):
'''Auditory vs. Visual Short-Term Memory'''


1.    Cognitive load:
Tillmann & Caclin (2021) provide evidence that auditory memory generally outperforms visual memory, especially for materials with a clear auditory contour. This suggests that structured auditory stimuli might enhance recall, whereas less structured sounds (like AI speech with unnatural intonations) could have the opposite effect. A comparison between human and AI voices could further validate this.
 
'''Auditory Similarity Effects in Recall'''
 
The study by Connor & Hoyer (1967) reinforces the idea that phonological (auditory) similarity affects recall more than visual similarity. This suggests that if AI-generated speech has distortions or inconsistencies, it might interfere with phonological encoding, reducing recall accuracy.  


a.   Machine voices may cause extraneous cognitive load and so reduce the cog resources available to learner to integrate information with existing knowledge structures.
'''AI Voices and Multimedia Learning'''


b.  Virtual humans could add additional processing to the environment in terms of visual or audio distraction.
Mayer (2014) emphasizes that human voices enhance learning more than machine voices, as they foster a sense of social presence. However, McGinn & Torre’s (2020) study found that high-quality AI voices can be indistinguishable from human voices and do not necessarily impact learning outcomes. This is corroborated by Craig and Schroeder (2017) as well as Dinçer (2022), the latter specifically finding no cognitive load differences when using a modern synthetic voice and human speech.  


2.   Social agency:
'''Embodiment and Perception in Human-Robot Interaction'''


a.   Recorded human voice provides an experience that is easier to identify as a social interaction, thus promoting the active learning process.)
Studies by Wainer et al. (2006) and Seeger et al. (2018) show that physical embodiment enhances social presence and perception. However, the effect is nuanced since nonverbal cues alone can decrease perceived anthropomorphism due to the uncanny valley effect. If AI-generated speech is paired with a robotic presence, the combination of physical embodiment and voice type could influence recall.  


==== Auditory Encoding and Short-Term Recall ====
'''Embodiment and learning'''
The study by Colle (1980) supports the central masking hypothesis, suggesting that auditory noise interferes with visual recall because the speech loop must pass through the preperceptual auditory store, where it gets masked by noise. This aligns with the idea that AI-generated speech, with its inconsistent flow and unnatural pauses, could function as a form of "structured noise," disrupting inner dialogue and reducing recall ability.  


==== Topic Interest and Incidental Learning ====
The presence of pedagogical agents increases learning outcomes against no-embodiment conditions (static agents and/or no-agent conditions) and no-agent conditions. Embodied agents, those that posses human-like characteristics such as facial expression, gestures, lip synchronization, and body sway significantly increase retention scores (Davis et al., 2022). Further support comes from Mayer and DaPra (2012), who found that learners performed better on a transfer test when a human-voiced agent displayed human-like gestures, facial expression, eye gaze, and body movement than when the agent did not, yielding an embodiment effect. The participants in the study by Fiorini et al. (2024) reported greater arousal and dominance when interacting with embodied robots compared to voice-only interfaces. Perhaps in the same vain, the study by Dennler et al. (2024) suggests that embodiment increases perceived capability, which may affect information retention. These findings therefore indicate that higher expectations may lead to increased engagement but also potential disappointment if unmet. More broadly speaking, in their meta-analysis, Ouyang and Xu (2024) argue that instead of using robots to directly convey knowledge, instructors should utilize educational robotics to facilitate students’ learning experience and work as facilitators to provide guidance and support students.  
Cancino’s (2019) research highlights how topic interest significantly influences vocabulary retention in incidental learning settings. This effect is mediated by cognitive processing depth and dictionary use.  


==== Auditory vs. Visual Short-Term Memory ====
'''Social Cues in Multimedia and Human-Robot Interaction'''
Tillmann & Caclin (2021) provide evidence that auditory memory generally outperforms visual memory, especially for materials with a clear auditory contour. This suggests that structured auditory stimuli might enhance recall, whereas less structured sounds (like AI speech with unnatural intonations) could have the opposite effect. A comparison between human and AI voices could further validate this.


==== Auditory Similarity Effects in Recall ====
Mayer’s (2014) research also suggests that social cues like conversational tone and embodiment enhance learning, aligning with Admoni & Scassellati’s (2017) findings that gaze cues improve engagement and trust in robots. This could imply that AI voices in robots might be more effective if combined with gaze behavior and facial expressions, as suggested by Schömbs et al. (2023).  
The study by Connor & Hoyer (1967) reinforces the idea that phonological (auditory) similarity affects recall more than visual similarity. This suggests that if AI-generated speech has distortions or inconsistencies, it might interfere with phonological encoding, reducing recall accuracy.  


==== AI Voices and Multimedia Learning ====
'''Body movements and tone of voice:'''
Mayer (2014) emphasizes that human voices enhance learning more than machine voices, as they foster a sense of social presence. However, McGinn & Torre’s (2020) study found that high-quality AI voices can be indistinguishable from human voices and do not necessarily impact learning outcomes. This is corroborated by Craig and Schroeder (2017) as well as Dinçer (2022), the latter specifically finding no cognitive load differences when using a modern synthetic voice and human speech.


==== Embodiment and Perception in Human-Robot Interaction ====
Velentza et al. (2021) found that robots with a cheerful personality and expressive body movements are more engaging and desirable for educational interactions. They also caution that overly friendly storytelling can reduce engagement, as it may come off as unnatural or excessive. Additionally, embodied robots using naturalistic gestures lead to higher perceived emotional engagement (Fiorini et al., 2024). These findings highlight the importance of synchronized verbal and non-verbal cues in improving communication effectiveness. Furthermore, users tend to expect more human-like behavior from robots with a physical body compared to virtual ones (Dennler et al., 2024). In regards to pitch, Suzuki et al. (2003) found that humans are sensitive to even the slightest changes in synthetic voice pitch and that they can view these changes as either confirmation or negation, which can be an important factor for problem solving and a consideration for effective learning environments. Still, it is important that the voice isn't too cute, as that can hinder learning outcomes (Jing et al., 2024).  
Studies by Wainer et al. (2006) and Seeger et al. (2018) show that physical embodiment enhances social presence and perception. However, the effect is nuanced since nonverbal cues alone can decrease perceived anthropomorphism due to the uncanny valley effect. If AI-generated speech is paired with a robotic presence, the combination of physical embodiment and voice type could influence recall.  


==== Embodiment and learning ====
'''What we know about human preferences for robot voices:'''
The presence of pedagogical agents increases learning outcomes against no-embodiment conditions (static agents and/or no-agent conditions) and no-agent conditions. Embodied agents, those that posses human-like characteristics such as facial expression, gestures, lip synchronization, and body sway significantly increase retention scores (Davis et al., 2022). Further support comes from Mayer and DaPra (2012), who found that learners performed better on a transfer test when a human-voiced agent displayed human-like gestures, facial expression, eye gaze, and body movement than when the agent did not, yielding an embodiment effect. The participants in the study by Fiorini et al. (2024) reported greater arousal and dominance when interacting with embodied robots compared to voice-only interfaces. Perhaps in the same vain, the study by Dennler et al. (2024) suggests that embodiment increases perceived capability, which may affect information retention. These findings therefore indicate that higher expectations may lead to increased engagement but also potential disappointment if unmet. More broadly speaking, in their meta-analysis, Ouyang and Xu (2024) argue that instead of using robots to directly convey knowledge, instructors should utilize educational robotics to facilitate students’ learning experience and work as facilitators to provide guidance and support students.


==== Social Cues in Multimedia and Human-Robot Interaction ====
Masculine voice agents are perceived as more "informative" (Seaborn et al., 2021), and social presence is rated higher when a robot’s perceived gender matches its voice (Seaborn et al., 2021). This is important because higher perceived social presence is associated with improved learning outcomes (Craig & Schroeder, 2017). Additionally, both feminine and masculine voices are considered appropriate for educational settings (Seaborn et al., 2021). More specifically, the Nao robot with a masculine voice was perceived as friendlier, more trustworthy, and that the masculine voice was a better overall fit for it (Seaborn et al., 2021). Additionally, the use of vocal fillers tends to enhance user experiences with voice agents. When robots utilized hedges and discourse markers, such as vocal fillers, people responded to them similarly to how they would respond to humans (Seaborn et al., 2021).
Mayer’s (2014) research also suggests that social cues like conversational tone and embodiment enhance learning, aligning with Admoni & Scassellati’s (2017) findings that gaze cues improve engagement and trust in robots. This could imply that AI voices in robots might be more effective if combined with gaze behavior and facial expressions, as suggested by Schömbs et al. (2023).  


==== Body movements and tone of voice: ====
=== References===
Velentza et al. (2021) found that robots with a cheerful personality and expressive body movements are more engaging and desirable for educational interactions. They also caution that overly friendly storytelling can reduce engagement, as it may come off as unnatural or excessive. Additionally, embodied robots using naturalistic gestures lead to higher perceived emotional engagement (Fiorini et al., 2024). These findings highlight the importance of synchronized verbal and non-verbal cues in improving communication effectiveness. Furthermore, users tend to expect more human-like behavior from robots with a physical body compared to virtual ones (Dennler et al., 2024). In regards to pitch, Suzuki et al. (2003) found that humans are sensitive to even the slightest changes in synthetic voice pitch and that they can view these changes as either confirmation or negation, which can be an important factor for problem solving and a consideration for effective learning environments. Still, it is important that the voice isn't too cute, as that can hinder learning outcomes (Jing et al., 2024).


==== What we know about human preferences for robot voices: ====
<references />
Masculine voice agents are perceived as more "informative" (Seaborn et al., 2021), and social presence is rated higher when a robot’s perceived gender matches its voice (Seaborn et al., 2021). This is important because higher perceived social presence is associated with improved learning outcomes (Craig & Schroeder, 2017). Additionally, both feminine and masculine voices are considered appropriate for educational settings (Seaborn et al., 2021). More specifically, the Nao robot with a masculine voice was perceived as friendlier, more trustworthy, and that the masculine voice was a better overall fit for it (Seaborn et al., 2021). Additionally, the use of vocal fillers tends to enhance user experiences with voice agents. When robots utilized hedges and discourse markers, such as vocal fillers, people responded to them similarly to how they would respond to humans (Seaborn et al., 2021).
The full list of the 25 studies can be found [https://tuenl-my.sharepoint.com/:x:/r/personal/l_w_f_j_vrouenraets_student_tue_nl/_layouts/15/Doc.aspx?sourcedoc=%7BCAE64DA5-6534-4B57-AEB7-C2C4A4684775%7D&file=Articles%20overview.xlsx&action=default&mobileredirect=true&DefaultItemOpen=1&web=1 here]

Latest revision as of 14:21, 10 April 2025

Robot-assisted vocabulary learning

Abstract

In this paper, we conducted research into robot-assisted novel vocabulary learning to investigate whether the differences in the tutoring agent's embodiment would result in differences in performance on a language-based retention test. This study examines the factors that may contribute to higher retention scores, including experienced cognitive load, the perceived social presence of the robot, and general impressions of the robot. A within-subject design was implemented for this investigation, and 21 participants were recruited to participate in a vocabulary learning task, where words from an artificial language - Vimmi were used. The participants were asked to learn both from a physically present robot - Misty, as well as from video recordings of the same robot presented on a laptop screen. No statistically significant evidence supporting the superiority of either embodiment condition in terms of test performance was found. The post-test questionnaires revealed that the participants were more likely to perform better on the task if they also scored the robot as more likable. Additionally, differences in safety perception emerged, suggesting that the physically embodied robot led to feelings of unsafety relative to the one presented on a screen.


Names (id): Margot Dijkstra (1893793), Llywelyn Vrouenraets (1879790), Sem Schreurs (1809539), Vladis Michail (1792814), Alessia-Maria Postelnicu (1839330), Sebastian Ciulacu (1886711)

Supervisors: Elena Torta (e.torta@tue.nl), Raymond Cuijpers (r.h.cuijpers@tue.nl) , Mel Sexton (m.sexton@tue.nl)

Introduction

Problem statement and objectives:  

It is no overseen issue that there is a shortage of teachers within education. School districts are struggling to find certified teachers, especially in world languages (Hanford, 2017[1]; Koerting, 2017[2]; Motoko, 2015[3]). This is a problem that the Technical University of Eindhoven faces as well. A search on Osiris revealed that there is more demand for Dutch courses than the TU/e is currently providing. Additionally, the course registration page shows that most timeslots are full early in the quartile, preceding the one in which teaching activities are scheduled. Furthermore, students taking Dutch in the third quartile received an email in the first week of the course urging them to already sign up for the follow-up course in the fourth quartile due to a "very high demand". As a result, international students who want to make use of the Dutch classes provided by the university are disappointed time and time again as the demand simply cannot be met. There are way too many students, and yet there is a huge shortage of opportunities for them to take the Dutch classes. This also means that if they manage to get into such a class, they are usually overflowing with students, often offering low one-to-one time with the tutor. Those who do not, however, end up being put on a waitlist with no clear indication of whether they will be able to successfully sign up for a language course in the desired quartile.

To tackle this issue, we looked towards HTI and robotics. We believe robotics can be a good tool to handle the aforementioned problem as it can be used to assist tutors in their teaching. With robotic assistance, tutors can divide their attention while the robot takes over certain tasks, increasing the opportunities for quality learning. We aim to help increase the opportunity for students to attend foreign language classes, even in times of high demand by students and a low supply of tutors for such classes.

Who are the users?

University students often face high cognitive loads throughout their studies, meaning that it is crucial to support them by providing efficient learning and retention strategies. Cognitive load theory suggests that when a student’s mental load becomes really high, it may overwhelm their cognitive capacity, negatively influencing effective learning and information retention (Sweller, 1988)[4]. In our problem statement, we mention international students at the TU/e who want to study Dutch, but this concept can be applied to a wide audience, consisting of all universities with a large number of foreign students. This means that the main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it, is more general. This would be any university student trying to study a foreign language other than their language of study.

What do they require?

The university students require opportunities to learn the language of the country where they are staying. We believe that giving these students the opportunity to learn (predominantly vocabulary) through a robotic tutor will increase the capacity of the university to meet this demand. In this way, more students would be able to learn the local language, aiding their integration into the culture.  

The main requirements of the users that we aim to cater to in this research are the provision of supervision and smaller groups. Supervision has been found to be very influential, as it increases the general effectiveness of learning (Holloway, 1992)[5]. It makes way for structured and repeated learning, giving the students a clear way of doing things, as well as a clear timeslot to do it in. This is why we expect robotic tutors to be more effective than language-learning apps, the likes of Duolingo, which provide supervision by means of mobile notifications, rather than physical presence. Jones (2007)[6] promotes smaller group sizes since it promotes active participation, and deep rather than surface learning. One of the problems, however, is that it states that small groups typically require greater investment in key resources such as staff and meeting rooms.  

State of the Art

In recent years, advancements in robotics have led to the development of social, language learning robots designed to enhance the way people learn new languages. Currently, the most prominent state-of-the-art robots of this kind focus on children and school-aged children as the main user base. Among the most advanced in this field are EMYS, Elias, and SkoBots, each offering unique approaches to interactive language education. EMYS is an expressive robot specifically designed for young, mainly bilingual, language learners, typically aged 3 to 7. It has a mechanized face with three moving discs, somewhat resembling a turtle, that allow it to display emotions. The robot comes with a set of cards that children can use, pictures of animals and such, to expand their vocabulary (EMYS Robot + 3 Sets, 2025)[7].

Figure 1: EMYS. [8]

Elias Robot, on the other hand, is designed primarily for classroom environments and provides AI-powered conversational practice in multiple languages. Its build is essentially that of a Nao robot. It integrates itself with educational systems and tablets, allowing teachers to customize lessons and track student progress via a specialized app, in which educators can make the necessary choices. It appears to have more functions than EMYS as it can complete more language-learning oriented tasks, such as pronouncing out loud a specific sentence, and also has a range of physical movements that it can do, such as dancing. Still, it appears that the EMYS robot is more emotionally expressive, while Elias is more bodily expressive (Elias Robot | Elias Robot, 2022)[9].

Figure 2: Elias robot.[10]

SkoBots represent a more niche, yet more exciting venture in language learning robots than the previous two projects. It is a 3D-printed companion that can sit on the user’s shoulder and assist its mainly Native American user base with learning a Native American language. This way, SkoBots cater to a wider range of users, including older students and independent learners. Most of the sample videos show the robots assisting Native American teens with strengthening their vocabulary and providing dynamic, real-time corrections. One of the admirable aspects of this robot is its unique design, ease of assembly, and a low price point, making it accessible to marginalized communities (SkoBots Language Learning, n.d.)[11].

Figure 3: SkoBots.[12]

Literature Background

The main mediating factors for the success of learning with an artificial tutor appear to be the amount of cognitive load experienced by the learner and the perceived social presence of the tutoring agent. Virtual humans could add additional processing to the environment in terms of visual or audio distraction, thus increasing cognitive load (Craig & Schroeder, 2017)[13]. This is supported by studies like Wainer et al. (2006)[14] and Seeger et al. (2018)[15], which show that physical embodiment enhances social presence and perception. Furthermore, increases in social presence have been positively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002)[16].

In previous research, the presence of pedagogical agents was found to increase learning outcomes against non-embodiment conditions, including static agents and no-agent conditions. Embodied agents, those that possess human-like characteristics such as facial expression, gestures, lip synchronization, and body sway, significantly increase retention scores (Davis et al., 2022)[17]. Further support comes from Mayer and DaPra (2012)[18], who found that learners performed better on a transfer test when a human-voiced agent displayed human-like gestures, facial expression, eye gaze, and body movement than when the agent did not, yielding an embodiment effect. The participants in the study by Fiorini et al. (2024)[19] reported greater arousal and dominance when interacting with embodied robots compared to voice-only interfaces. Perhaps in the same vein, the study by Dennler et al. (2024)[20] suggests that embodiment increases perceived capability, which may affect information retention. These findings therefore indicate that higher expectations may lead to increased engagement but also potential disappointment if unmet. In their meta-analysis, Ouyang and Xu (2024)[21] argue that instead of using robots to directly convey knowledge, instructors should utilize educational robotics to facilitate students’ learning experience and work as facilitators to provide guidance and support to students.

More specific to Robot-Assisted Language Learning (RALL), Van Den Berghe et al. (2018)[22] suggest in their review that children may be able to learn words from robotic agents equally well as from human teachers, or when receiving assistance from their peers. Zinina et al. (2022)[23] studied university-aged linguistics students, who were asked to practice vocabulary learning in Latin (a language that was foreign to them) and were subsequently asked to evaluate their experience and the performance of the robot as an assistant tutor. They judged the robot to give a positive impression and reported increased motivation and desire to use robot-assisted learning in the future. The review of RALL for adults by Deng, Qi et al. (2024)[24] suggests that interacting with robots allows learners to speak in a low-stress environment, improving fluency and confidence. Additionally, it was found that being taught by a physical robot, in contrast to one on video, enhanced performance on a vocabulary test and improved engagement.  

Research Aim and Hypothesis

In our present research, we investigated the effect of two different levels of embodiment of an artificial tutor on the recall of novel vocabulary. Specifically, we compared a physical robot condition with videos of the same robot on a screen. This led us to formulate the following research question: Does the level of embodiment of the tutoring agent affect novel vocabulary retention? Based on previous research, we hypothesize that the physical robot tutor condition will result in higher novel vocabulary retention relative to a screen-based agent.  

Methods

Participants & design

The sample included 21 students of ages ranging from 18 to 26, with the average age being 21.1 years old. Of the participants, 7 were female and 1 participant identified as other. We chose a within-participant design to control for individual differences. One condition used a physical robot, and the other condition used a video recording of the same robot. An artificial language, Vimmi, was used, which was created for research purposes and does not resemble any existing language (M. Macedonia, 2010)[25]. Misty II was used as the tutor robot (Misty II | Misty Robotics, z.d.)[26]. 10 participants received Misty as their tutor first, and 11 received a laptop with video recordings of Misty first. This was done to minimize potential order effects. To avoid confounding effects from being with a peer, we had each participant complete the task individually.

Procedure & setting

We deployed the Wizard of Oz method to control the robot/display from a different room. After being introduced to 5 Vimmi words by the artificial tutor, participants took an oral test where the tutor gave them a word in Vimmi and they had to answer with the correct English translation. As mentioned before, we used Misty for the embodied condition, as this robot is equipped with an LED screen allowing for facial expression with eyes and eyebrows, head movements, and arm movements. The robot is 35.5 cm tall, and its arm can only rotate 180 degrees. As the participants were located in another room, we used a microphone setup to make sure we were able to hear them from the other room in case something went wrong or the participants needed help. During the experiment and the questionnaires, the participant was alone in the room.

Measurements

After each condition, participants were asked to fill in a questionnaire containing the Cognitive Load Questionnaire (CLQ) (Paas, 1992)[27], Robot Social Presence Questionnaire (RSPQ) (Chen et al., 2023)[28], and a modified Godspeed Questionnaire (GQ) (Bartneck, 2009[29];C.M. Carpinella, 2017[30]). The CLQ was used to see whether our participants experienced a difference in cognitive load based on which tutoring agent they received. We believe that the video recording of the robot may potentially cause extraneous cognitive load and, in turn, reduce the cognitive resources available to the learner to integrate information with existing knowledge structures. This is because virtually presented agents could add additional processing to the environment in terms of visual or audio distraction (Craig & Schroeder, 2017)[31]. The RSPQ was added as we are interested in seeing whether our participants judge the social presence of the robot tutor differently from the social presence of the on-screen agent. Decreases in social presence have been negatively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002)[32]. Thus, it is interesting for us to measure if physically embodying the tutor inside the robot will increase the judgment of social presence, which could then potentially increase the retention of information. Lastly, a modified GQ Series, RoSAS, was used. We chose to include this questionnaire to measure the general impression that the participants had of the robot across the dimensions of Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety. This will allow us to better understand whether the participants enjoyed the experience with either agent in the first place and whether there is a relationship between their judgment and performance on the learning task. To collect the survey data, we used lab.js.

Statistical analysis

For the statistical testing, we first used Cronbach's alpha to test the reliability of the GQ and the RSPQ. We conducted paired t-tests to examine whether scores differed between the two conditions and to see whether participants scored differently on the CLQ, the dimensions of the RSPQ, and GQ between the two conditions. Cohen’s d was used to determine the effect size of our findings. Finally, a linear regression was performed to determine whether any of the variables are a significant predictor for the score.

Results

We investigated whether there is a relationship between the embodiment of a robot tutor and vocabulary retention. The main statistical tool we used in this analysis was paired t-tests. All hypothesis testing was conducted with the statistical analysis tool StataBE18.

Reliability of the Questionnaires

The Godspeed Questionnaire

We conducted a Cronbach's Alpha analysis per dimension to assess the reliability of the questionnaire dimensions in our dataset. Our study included the Godspeed questionnaire, which has five dimensions (Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety), and the Social Presence questionnaire.

In the Godspeed questionnaire, the dimension Anthropomorphism had an acceptable reliability (α = 0.7587), with the item of Naturalness (high means more natural, while low means fake) showing the highest item-test correlation (0.8733), the lowest of this dimension being Lifelikeness (high means life-like and low represents artificial) with a item-test correlation of 0.7442. These results suggest that all the items contribute meaningfully to the dimension, therefore, no item removal was necessary. While lifelikeness did not score high on the Anthropomorphism dimension, it had the strongest correlation (0.7791) on the Animacy dimension. The lowest correlated item in Animacy was aliveness (the higher, the more alive; and the lower, the more dead) with a correlation of 0.6720. The Animacy dimension has a moderate internal consistency, with a Cronbach’s Alpha at 0.6833. With the scope of this exploration in mind, we considered the dimension sufficiently reliable and we will retain it, but in future applications, we suggest that a potential improvement could be achieved through revision or replacement of some items. The Likeability (α = 0.8804) and the Perceived Safety (α= 0.8171) displayed the highest reliability from the Godspeed questionnaire. All the items from these dimensions were retained, as removing any would reduce the alpha. For the Likability, the niceness displayed the highest correlation (0.8951), where the higher the score, the nicer the robot seemed, and the lower scores conveyed that the robot was perceived as awful. The Perceived Safety had the highest item-test correlation on the relaxation item (the higher, the more relaxed; the lower, the more anxious) and the lowest correlation on safety. Perceived Intelligence had a low Cronbach’s Alpha, initially at 0.6078. The perceived responsibility (high means responsible, low means irresponsible) and sensibility (high represents sensible, while low scores perceived the robot as foolish) were removed due to their low scores on the item-test correlation. After omitting these two items, the alpha level of the Perceived Intelligence dimension improved to 0.7537.

The Social Presence Questionnaire

Chen et al. (2023) do not name the dimensions of the Social Presence Questionnaire; therefore in this exploration, we decided that the following names would be appropriate: Dimension 1-Perceived Engagement with Social Robots, Dimension 2- Robot Collaboration and Adaptability, Dimension 3- Emotional Influence of Robots, Dimension 4- Perceived Communication Ability and Dimension 5 remains unnamed due to the low number of items (only 2 items were identified in this dimension).

The first four dimensions have acceptable reliability. Perceived Engagement with Social Robots had an alpha of 0.7948, where all the items contributed positively, and no removals were necessary. Robot Collaboration and Adaptability has a borderline moderate reliability (α = 0.6564), but for this paper, this is considered acceptable, and all the items are retained. The Emotional Influence of Robots is slightly below the threshold with alpha = 0.6810, but again, all the items are retained for further analysis. Perceived Communication Ability has an acceptable reliability (α = 0.7266), and removing any items would cause a drop in the alpha, so all items are retained.

The dimension that we did not include in our analysis is the 5th one, as it only contains two items and has an extremely poor reliability (α = 0.0523).


After this reliability analysis, we proceeded to test the hypothesis formulated in the introduction.

Hypothesis testing

Paired t-tests showed there is no statistically significant difference in means between the two conditions (p = 0.086). However, the p-value showed that there is a trend (p < 0.10), suggesting that participants scored better in the screen condition. (see figures 4 and 5).

Figure 4: Mean Scores in Screen and Robot Condition
Figure 5: Individual Performance across Conditions


For the CLQ, no significant difference between the robot and the screen condition (p = 0.309). In our data, there was no significant correlation between the score and the cognitive load in the screen condition (p = 0.494) or the robot condition (p = 0.343).


From the GQ, the safety dimension was scored as significantly different between the two conditions (p = 0.007), indicating that the robot was perceived to be safer in the screen condition than in the robot condition. No significant difference was found for the other dimensions like animacy (p = 0.261), anthropomorphism (p = 0.156), likeability (0.677), or intelligence (p = 0.467). The effect sizes for each dimension can be found in Figure 6.

Figure 6: Cohen's d for each dimension (Godspeed questionnaire)


A linear regression was also performed on both conditions, and likeability emerged as the only significant predictor of the score in the robot condition. (p=.004, effect size=1.63). There were no significant findings for the screen condition.

From the RSPQ, the perceived presence dimension was scored significantly different between the two conditions (p=.0002), indicating that the physical robot was perceived to be significantly more present than the screen version. The Emotional Atmosphere dimension was scored as significantly different between the two conditions as well (p=.008), suggesting that participants felt there was more of an emotional atmosphere with the physical robot than in the screen condition. The rest of the dimensions, like helpfulness & collaboration (p =.947), communication ability (p=.638), and distraction (p=.877), weren’t scored significantly different between the two dimensions. The effect sizes can be found in Figure 7.

Figure 7: Cohen's d for each dimension (Social Presence questionnaire)

Discussion

Main findings

Given the trend indicating that students, on average, performed better with the screen version of Misty, a screen-based approach could be used for the time being​. Van Den Berghe et al. (2018)[22] suggest that children may be able to learn words from robotic agents as well as from human teachers. However, the review of RALL for adults by Deng, Qi et al. (2024)[24] suggests that interacting with robots allows learners to speak in a low-stress environment, improving fluency and confidence. Additionally, it was found that being taught by a physical robot, in contrast to one on video, enhanced performance on a vocabulary test and improved engagement. Our results fall more in line with the findings by Van Den Berghe et al. (2018)[22] as the difference in scores between the two conditions was not significant. Our findings contradict Craig & Schroeder (2017), who suggested that virtual humans, the videos of Misty in our case, may add cognitive load, as we found no differences between the two conditions in the cognitive load scores. They do, however, complement the studies by Wainer et al. (2006)[14] and Seeger et al. (2018)[15], as Misty was judged to be more socially present in the robot condition than in the screen condition. In our analysis, we further found that a physical robot was perceived as unsafe with a moderate effect size (Cohen’s d = 0.66), which contradicts the ideas about robots allowing for a safe environment presented in Deng, Qi et al. (2024)[24]. Furthermore, likeability was found to be a significant predictor of score with a physical robot, with the score increasing by 1.63 points for every increase of 1 point of likeability on the five-point Likert scale.

Potential limitations

There were multiple limitations that must be addressed. With a total of 21 participants, our sample size was smaller than ideal for detecting subtle effects. Previous research indicates Embodiment increases test performance (Deng et al. 2024)[24], which contradicts our findings. Several limiting factors could contribute to this. Most studies that used RALL did not use Misty as the tutor robot, which is a rather small robot that has a visible camera, which was informally criticized by multiple participants. The Wizard of Oz method also may have caused some delays since a human has to control the robot in real time, whereas for the video, the researcher only had to press one button to go to the next section. Additionally, the number of words was limited to just five in both conditions, this could mean that we may have tested participants' working memory rather than vocabulary retention. Future research should aim to address these constraints by increasing sample size and the number of words used, as well as automating robot control to improve on our shortcomings.

Implications of the study

Given that there were no significant differences in scores, a screen-based approach could be used for language learning without the presence of a physical tutor for the time being. No differences between our two conditions in terms of score suggest that students may learn equally well from a physically present robot and a recording of an artificial agent. People felt safer when shown videos of the robot than when the robot was physically present. Subsequently, we recommend future research in this direction to determine what specific factors contribute to people feeling unsafe. Furthermore, people who performed better on the robot condition of the learning task thought that the robot was more likable. We therefore also recommend future research in this direction to determine which factors make an individual like a robot more, which will then hopefully better their learning.

As of now, we can’t say that robot tutors can be efficiently used in language teaching settings, especially since online platforms for classes are possibly more convenient and less costly. However, additional research into the topic of RALL in adults and what causes robots to be good or bad tutors, as well as how to design them accordingly, may bring down some of the stress on the language learning tutors that the TU/e and many other places face.

Appendix

Appendix A: Logbook

logbook

Appendix B: Materials used for experiment:

Files

Appendix C: Weekly updates

Week 6 updates

Report progress:

Research script: contains the research script we used for each participant in order to standardize the instructions.

Experimenter instructions: contains the instructions for the experimenters conducting the procedure. The second page of the document also has direct links to the canva files we created to be used during the "screen" condition.

Experimental code: contains the scripts used for Misty, with the corresponding blocks either disabled or enabled.

Questionnaire: the zip file contains the html to the questionnaire we created in Labjs. Note that for the link to work, it must be in the same folder as the rest of the documents in the zip file due to dependencies.

Videos: contains the unedited videos we filmed of Misty. These were then placed in the canva slides you can find in the experimenter instructions document.

Week 5 updates

We started off the week by adding the final touches to the questionnaire, preparing the participants schedule, writing the researcher script and finalizing the programming of Misty. We also filmed the robot for the on-screen condition. We then ran the experiment according to the schedule found in week 4

Week 4 updates

We experimented with the robot to see which words are going to be used. All the group members got acquainted with Misty and learned the procedure of how to operate the robot. We reserved the rooms for the experiment, the sessions will take place on the 8th floor of Atlas building on 18th, 19th, 20th and 21st of March (during working hours). Details of the experiment have been settled like using 5 words per conditions and the randomization of words per conditions. We realized that we should charge the robot in advance for the experiment and included that in our planning. 15 participants were already contacted to express their availability. More people will be contacted during the weekend.

Bellow there is a rough planning for next week:

Monday Tuesday Wednesday Thursday Friday
Recording video for the screen-based condition 9:00-17:00 Running the experiments 9:00-17:00 Running the experiments 9:00-17:00 Running the experiments 9:00-17:00 Running the experiments

Carnival Break: During the break we programmed our questionnaires and worked with the robots twice. We decided to use Misty, as we found out that Nao can have some unfortunate delays. We discovered that the Misty App is no longer available in the EU, so we were unsure how to move forward with this, and so this is something we would like to bring up at the meeting. We also finalized our ERB form and have created the questionnaires we are going to present to our participants.

Week 3: For the third week our main task was to collect practical material and familiarize ourselves with the robots. We held a meeting to discuss the possible questionnaires we can use as well as how we would like to program the robots. Additionally, we dedicated time to finding prospective research participants by attending a BRM1 lecture and giving a short talk to the students, as well as asking our friends.

Week 2: For the second week our main task was to refine our project based on the feedback we received at the Monday tutor session. We held a meeting to discuss the possible directions we can take our research in and settled on educational robots for vocabulary learning. Following this, we divided the work needed to update our report and worked in pairs again. The three main tasks were 1. Updating the problem statement and objectives, 2. Writing an ERB form to be sent for ethical approval, 3. Specifying our state of the art findings to be more specific about Robot-Assisted Language Learning (RALL).

Week 1: For the first week we all looked for relevant research articles and ended up with 25 in total. Then, each one of us summarized the articles with the most relevant bullet points and we had a one hour meeting to discuss on Friday 14th. At this meeting, we subdivided the work needed to update the wiki into sections and worked in pairs to accomplish that.

You can find the individual time contributions on the second page of the logbook called Time Use

Upcoming meeting agenda: Meeting agenda Monday 2025-02-24.docx

Week 3 Updates

Research Question: Does the level of embodiment of the tutoring agent affect novel vocabulary retention?

Hypothesis: Embodiment in robotic agent will result in higher novel vocabulary retention relative to a screen-based agent.  

Study design:

Participant sample: TU/e students.


Between-participant design: Half of the participants will have Misty as their tutor, the other half will have a laptop with video of Misty. The two groups will be able to ask the tutor to perform the same tasks (repeat a word, spell it out) by telling it to do so. We plan on using a wizard of oz setup where we control the robot/ what is displayed on the screen from a different room. We propose that each participant completes the task individually to avoid confounding effects from being with a peer. After the dedicated learning time has passed, the participants will be asked to perform a short break task (to avoid recency bias). Following that, they will take an oral test where the tutor gives them the foreign word (we settled on using Vimmi) and they have to answer back with the correct English translation.

Questionnaires to be used after experimental trial

Cognitive Load Questionnaire (Paas, 1992) - (1 question). We would like to see whether our participants experienced a difference in cognitive load based on which tutoring agent they received. We believe that the video recording of the robot may potentially cause extraneous cognitive load and in turn reduce the cognitive resources available to learner to integrate information - in this case new vocabulary - with existing knowledge structures. This is because virtually presented agents could add additional processing to the environment in terms of visual or audio distraction (Craig & Schroeder, 2017).

How much mental effort did you apply in completing the vocabulary learning task?

Please choose the category (1, 2, 3, 4, 5, 6, 7, 8, or 9) that applies to you:

9-point likert Scale (1 = very, very low mental effort; 2 = very low mental effort; 3 = low mental effort; 4 = rather low mental effort; 5 = neither low nor high mental effort; 6 = rather high mental effort; 7 = high mental effort/ 8 = very high mental effort; 9 = very, very high mental effort

Re-test Reliability= .90

Robot Social Presence (Chen et al., 2024) - (19 questions). We are interested in seeing whether our participants judge the social presence of the robot tutor differently to the social presence of the on-screen agent. Decreases in social presence have been negatively related to the impact and retention of presented information in online learning environments (Tu & McIsaac, 2002). Thus, for us it would be interesting to measure if physically embodying the tutor inside the robot will increase the judgement of social presence which could in turn potentially increase the retention of information.

Robot Social Presence Scale (Chen et al., 2024)

Godspeed Questionnaire Series (Bartneck, 2008) - (24 items). We chose to include this questionnaire to measure the general impression that the participants had of the robot across the dimensions of Anthropomorphism, Animacy, Likeability, Perceived Intelligence and Perceived Safety. This will allow us to better understand whether the participants enjoyed the experience with either agent in the first place and whether there is a relationship between their judgement and performance on the learning task.

Godspeed Questionnaire Series (Bartneck, 2008)

Getting acquainted with the NAO robot

A trial using the Choregraphe screen-based app was executed this week to get acquainted with the motions and the audio functions of the robot. The motions seem to be executed smoothly, but on the real-life robot, some mobility and equilibrium issues might be encountered. The audio did not function in the screen-based application, only a text bubble containing the assigned script wad displayed above the NAO robot, so the auditory aspect will be assessed on the physical robot.

Week 2 Updates

Problem statement and objectives:  

It is no overseen issue that there is a shortage of teachers within education. School districts are scrambling to find certified teachers, especially in world languages (Hanford, 2017; Koerting, 2017; Motoko, 2015). This is a problem that the Technical University of Eindhoven faces as well. After a small investigation, the following came to light: International students who want to make use of the Dutch classes provided by the university are disappointed time and time again since the demand simply cannot be met. There are way too many students, and yet there is a huge shortage of opportunities for them to take the Dutch classes they want to take. This also means that if they manage to get into such a class, they are usually overfilled with students, often offering low one-to-one time with the tutor. Those who do not, however, end up being put on a waitlist with no clear indication of whether they will be able to successfully sign up for a language course in the desired quartile.

To tackle this issue, we will look towards HTI and robotics. We believe robotics can be a good tool to handle this aforementioned problem as it can be used to assist tutors in their teaching. With robotic assistance, tutors can divide their attention while the robot takes over certain tasks, increasing the opportunities for quality learning. The objectives will be to conduct research in the direction of providing students with a robotic tutor in hopes of seeing whether this could provide an adequate solution to this problem. We aim to help increase the opportunity for students to attend foreign language classes, even in times with high demand for such classes and a low supply of tutors.

Who are the users?

University students often face high cognitive loads throughout their studies, meaning that it is crucial to support them by providing efficient learning and retention strategies. In our problem statement, we mention international students at the TU/e who want to study Dutch, but this concept can be applied to a wide audience, consisting of all universities with a large number of foreign students. This means that the main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it is more general. This would be any university student trying to study a foreign language other than their language of study.

What do they require?

The university students require opportunities to learn the language of the country that they are staying in. We believe that giving these students the opportunity to learn (predominantly vocabulary) through a robot, will increase the capacity of the university to meet this demand. In this way, more students would be able to learn the local language, aiding their integration into the culture.  

The main requirements of the users that we aim to cater to in this research are to provide them with supervision and smaller groups. Supervision has been found to be very influential, as it increases the general effectiveness of learning (Holloway, 1992). It makes way for structured and repeated learning, giving the students a clear way of doing things, as well as a clear timeslot to do it in. This is why we expect robotic tutors to be more effective than language-learning apps, the like of Duolingo, which provide supervision by means of mobile notifications, rather than physical presence. Jones (2007) promotes smaller group sizes since it promotes active participation, and deep rather than surface learning. One of the problems, however, is that it states that small groups typically require greater investment in key resources such as staff and meeting rooms.  

State of the art

In recent years, advancements in robotics have led to the development of social, language learning robots designed to enhance the way people learn new languages. Currently, the most prominent state-of-the-art robots of this kind have children and school-aged children in mind as the main user base. Among the most advanced in this field are EMYS, Elias, and SKoBots, each offering unique approaches to interactive language education.

EMYS is an expressive robot specifically designed for young, mainly bilangual, language learners, typically aged 3 to 7. It has a mechanized face with three moving discs, somewhat resembling a turtle, that allow it to display emotions. The robot comes with a set of cards that children can use, pictures of animals and such, to expand their vocabulary.


Elias Robot, on the other hand, is designed primarily for classroom environments and provides AI-powered conversational practice in multiple languages. Its build is essentially that of a Nao robot. It integrates with educational systems and tablets, allowing teachers to customize lessons and track student progress via a specialized app, in which educators can make the necessary choices. It appears to have more functions than EMYS as it can complete more language-learning oriented tasks, such as pronouncing out loud a specific sentence, and also has a range of physical movements that it can do, such as dancing. Still, it appears that the EMYS robot is more emotionally expressive, while Elias is more bodily expressive.


SKoBots represent a more niche, yet more exciting venture in language learning robots than the previous two projects. It is a 3D printed companion that can sit on the user’s shoulder and assist its mainly Native American user base with learning a Native American langage. This way, SKoBots cater to a wider range of users, including older students and independent learners. Most of the sample videos show the robots assisting Native American teens with strengthening their vocabulary and providing dynamic, real-time corrections. One of the admirable aspects of this robot is its unique design, ease of assembly and a low price point, making it accessible to marginalized communities.

Van Den Berghe et al. (2018): In this review of Social Robots for Language Learning, the researchers suggest that children may be able to learn words from robotic agents equally well as from human teachers, or when receiving assistance from robotic agents of other children peers. For summary of the included study see below:

Van Den Berghe et al. (2018)

Zinina et al. (2022): In this study, university-aged linguistics students were asked to practice vocabulary learning in Latin (a language that was foreign to them) and were subsequently asked to evaluate their experience and the performance of the robot as an assistant tutor. The assisance entailed the robot giving the learners words in their native language, in this case Russian, that are most phonetically similar to the Latin words being asked to study. They judged to robot to give a positive impression, and reported increased motivation and desire to use robot-assisted learning in the future.


Deng, Qi et al. (2024): in this review of Robots Assisted Language Learning (RALL) for adults, the researchers suggest that interacting with robots allows learners to speak in a low-stress environment, improving fluency and confidence. Additionally, it was found that being taught by a physical robot, in contrast to one on video, enhanced performance on a vocabulary test and improved engagement. The study by Van den Berghe et al. (2021) used Brain-Computer Interfaces to provide adaptive feedback which the robot used to maintain the participant’s attention. However, there were some studies indicating that the introduction of social robots did not significantly improve learning outcomes. Lastly, using implicit teaching methods, such as conversational, could improve grammar and some studies found that it also improves pronunciation.

Week 1 Updates

Problem statement and objectives:  

The nature of the world around us is everchanging, technology evolves exponentially and developments in computational power reshape the reality around us, therefore it is essential to understand how these changes affect us and how we can develop technology that contribute to flourishing of the human species. Scenarios that seemed to belong to Science Fiction novels started to be implemented among robotic technology developers. Understaffed fields that are less appealing to the broad public seem to benefit from the attention of these developers, applications such as care robots, educational assistant robots, and factory robots are popular topics of robotics enthusiasts. An aspect that is frequently overlooked lays in the depths of their interaction with people, where characteristics that are intrinsically humane, like social norms, trust, meaning, culture and emotions play a central role. Learning and education are a significant aspect of the human experience and contributes to the development of our species, therefore this study will focus on Human-Robot Interactions (HRI) in the context of education, with an emphasis on the way information is delivered to the human receivers. Being a good educator is complex and involves many underlying characteristics, therefore this exploration will tackle a superficial layer of the depths of what it takes to be a teacher, namely we will look at what embodiment of an agent that delivers a material provides the best recall of the content.

Who are the users?

The main user group that would benefit from our research findings, as well as the future development guidelines that may stem from it, is university students, as they are the primary subgroup of students gradually encountering more robot-assisted learning environments (e.g., Pepper being used as a teaching assistant at Carnegie Mellon University).  University students often face high cognitive loads throughout their studies; therefore, it is crucial to support them by providing efficient learning and retention strategies. In the case of a robot teaching a student, such a strategy would be to assign the most suitable type of agent for the learning context.

What do they require?

For a university student it is important to have access to effective and adaptive learning strategies that can help them manage the high cognitive loads and improve their retention of information. In the context of robot-assisted education, this means that the robot that works with the student needs to maintain a high level of engagement while facilitating understanding and optimizing recall. The content and context of the environment in which the robot and student are based need to be taken into account as different situations will require different types of agents. Characteristics such as interaction style, verbal and non-verbal communication, and adaptability need to be clearly determined to best support learning. Additionally, students can benefit from personalized and interactive learning, in which case robots can adjust their approach based on the individual's learning needs.  

Useful vocabulary and theories:

The voice is an expressive aural medium of communication, it can be viewed as the "how" of vocalizations. Nonverbal paralinguistic properties that characterize the voice, such as tone, loudness, pitch and, timbre, are called vocalis. Speech is the linguistic content of the voice, primarily consisting of words, grammar, syntax and phonetics (Seaborn et al., 2021).

The voice Effect assumes that people learn better when they are exposed to multimedia instruction that includes a human voice rather than a machine-synthesized one (Craig & Schroeder, 2017). Recorded human voices provide an experience that is easier to identify as a social interaction, thus promoting the active learning process. This can be explained using cognitive load as machine voices may cause extraneous cognitive load and reduce cognitive resources available to integrate information with existing knowledge structures. The cognitive load could also be increased because could add additional processing to the environment in terms of visual or audio distraction.

Auditory Encoding and Short-Term Recall

The study by Colle (1980) supports the central masking hypothesis, suggesting that auditory noise interferes with visual recall because the speech loop must pass through the preperceptual auditory store, where it gets masked by noise. This aligns with the idea that AI-generated speech, with its inconsistent flow and unnatural pauses, could function as a form of "structured noise," disrupting inner dialogue and reducing recall ability.  

Topic Interest and Incidental Learning

Cancino’s (2019) research highlights how topic interest significantly influences vocabulary retention in incidental learning settings. This effect is mediated by cognitive processing depth and dictionary use.  

Auditory vs. Visual Short-Term Memory

Tillmann & Caclin (2021) provide evidence that auditory memory generally outperforms visual memory, especially for materials with a clear auditory contour. This suggests that structured auditory stimuli might enhance recall, whereas less structured sounds (like AI speech with unnatural intonations) could have the opposite effect. A comparison between human and AI voices could further validate this.

Auditory Similarity Effects in Recall

The study by Connor & Hoyer (1967) reinforces the idea that phonological (auditory) similarity affects recall more than visual similarity. This suggests that if AI-generated speech has distortions or inconsistencies, it might interfere with phonological encoding, reducing recall accuracy.

AI Voices and Multimedia Learning

Mayer (2014) emphasizes that human voices enhance learning more than machine voices, as they foster a sense of social presence. However, McGinn & Torre’s (2020) study found that high-quality AI voices can be indistinguishable from human voices and do not necessarily impact learning outcomes. This is corroborated by Craig and Schroeder (2017) as well as Dinçer (2022), the latter specifically finding no cognitive load differences when using a modern synthetic voice and human speech.

Embodiment and Perception in Human-Robot Interaction

Studies by Wainer et al. (2006) and Seeger et al. (2018) show that physical embodiment enhances social presence and perception. However, the effect is nuanced since nonverbal cues alone can decrease perceived anthropomorphism due to the uncanny valley effect. If AI-generated speech is paired with a robotic presence, the combination of physical embodiment and voice type could influence recall.

Embodiment and learning

The presence of pedagogical agents increases learning outcomes against no-embodiment conditions (static agents and/or no-agent conditions) and no-agent conditions. Embodied agents, those that posses human-like characteristics such as facial expression, gestures, lip synchronization, and body sway significantly increase retention scores (Davis et al., 2022). Further support comes from Mayer and DaPra (2012), who found that learners performed better on a transfer test when a human-voiced agent displayed human-like gestures, facial expression, eye gaze, and body movement than when the agent did not, yielding an embodiment effect. The participants in the study by Fiorini et al. (2024) reported greater arousal and dominance when interacting with embodied robots compared to voice-only interfaces. Perhaps in the same vain, the study by Dennler et al. (2024) suggests that embodiment increases perceived capability, which may affect information retention. These findings therefore indicate that higher expectations may lead to increased engagement but also potential disappointment if unmet. More broadly speaking, in their meta-analysis, Ouyang and Xu (2024) argue that instead of using robots to directly convey knowledge, instructors should utilize educational robotics to facilitate students’ learning experience and work as facilitators to provide guidance and support students.

Social Cues in Multimedia and Human-Robot Interaction

Mayer’s (2014) research also suggests that social cues like conversational tone and embodiment enhance learning, aligning with Admoni & Scassellati’s (2017) findings that gaze cues improve engagement and trust in robots. This could imply that AI voices in robots might be more effective if combined with gaze behavior and facial expressions, as suggested by Schömbs et al. (2023).

Body movements and tone of voice:

Velentza et al. (2021) found that robots with a cheerful personality and expressive body movements are more engaging and desirable for educational interactions. They also caution that overly friendly storytelling can reduce engagement, as it may come off as unnatural or excessive. Additionally, embodied robots using naturalistic gestures lead to higher perceived emotional engagement (Fiorini et al., 2024). These findings highlight the importance of synchronized verbal and non-verbal cues in improving communication effectiveness. Furthermore, users tend to expect more human-like behavior from robots with a physical body compared to virtual ones (Dennler et al., 2024). In regards to pitch, Suzuki et al. (2003) found that humans are sensitive to even the slightest changes in synthetic voice pitch and that they can view these changes as either confirmation or negation, which can be an important factor for problem solving and a consideration for effective learning environments. Still, it is important that the voice isn't too cute, as that can hinder learning outcomes (Jing et al., 2024).

What we know about human preferences for robot voices:

Masculine voice agents are perceived as more "informative" (Seaborn et al., 2021), and social presence is rated higher when a robot’s perceived gender matches its voice (Seaborn et al., 2021). This is important because higher perceived social presence is associated with improved learning outcomes (Craig & Schroeder, 2017). Additionally, both feminine and masculine voices are considered appropriate for educational settings (Seaborn et al., 2021). More specifically, the Nao robot with a masculine voice was perceived as friendlier, more trustworthy, and that the masculine voice was a better overall fit for it (Seaborn et al., 2021). Additionally, the use of vocal fillers tends to enhance user experiences with voice agents. When robots utilized hedges and discourse markers, such as vocal fillers, people responded to them similarly to how they would respond to humans (Seaborn et al., 2021).

References

  1. Hanford, E. (2017). Schools in poor, rural districts are the hardest hit by nation’s growing teacher shortage. APM Reports. Retrieved from https://www.apmreports.org/story/2017/08/28/rural-schools-teacher-shortage
  2. Koerting, K. (2017). Schools confront shortage of world language teachers. News Times. Retrieved from http://www.newstimes.com/local/article/Schools-confront-shortage-of-world-language-10996278.php
  3. Motoko, R. (2015). Teacher Shortages Spur a Nationwide Hiring Scramble (Credentials Optional). New York Times. Retrieved from: http://www.nytimes.com/2015/08/10/us/teacher-shortages-spur-a-nationwide-hiring-scramble-credentials-optional.html?_r=0
  4. Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12, 257-285.
  5. Holloway, E. L. (1992). Supervision: A way of teaching and learning. In S. D. Brown & R. W. Lent (Eds.), Handbook of counseling psychology (2nd ed., pp. 177–214). John Wiley & Sons.
  6. Jones, R. W. (2007). Learning and Teaching in Small Groups: Characteristics, Benefits, Problems and Approaches. Anaesthesia and Intensive Care, 35(4), 587–592. https://doi.org/10.1177/0310057x0703500420
  7. EMYS robot + 3 sets. (2025). EMYS. https://www.emys.co/product-page/emys-robot-3-sets
  8. https://www.emys.co/product-page/emys-robot-3-sets
  9. Elias Robot | Elias Robot. (2022). Elias Robot. https://www.eliasrobot.com/elias-robot-app
  10. https://www.eliasrobot.com/elias-robot-app
  11. SkoBots Language Learning. (n.d.). The STEAM Connection. https://www.steamconnection.org/skobots
  12. https://www.carasantamaria.com/podcast/danielle-boyer
  13. Craig, S. D., & Schroeder, N. L. (2017). Reconsidering the voice effect when learning from a virtual human. Computers & Education, 114, 193–205. https://doi.org/10.1016/j.compedu.2017.07.003
  14. Jump up to: 14.0 14.1 Wainer, J., Feil-seifer, D., Shell, D., & Mataric, M. (2006). The role of physical embodiment in human-robot interaction. ROMAN 2006 - the 15th IEEE International Symposium on Robot and Human Interactive Communication. https://doi.org/10.1109/roman.2006.314404
  15. Jump up to: 15.0 15.1 Seeger, A.-M., Pfeiffer, J., & Heinzl, A. (2018). Designing Anthropomorphic Conversational Agents: Development and Empirical Evalua- tion of a Design Framework Completed Research Paper. https://web.archive.org/web/20220802070748id_/https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1103&context=icis2018
  16. Tu, C.-H., & McIsaac, M. (2002). The Relationship of Social Presence and Interaction in Online Classes. American Journal of Distance Education, 16(3), 131–150. https://doi.org/10.1207/s15389286ajde1603_2
  17. Davis, R. O., Park, T., & Vincent, J. (2022). A Meta-Analytic Review on Embodied Pedagogical Agent Design and Testing Formats. Journal of Educational Computing Research, 61(1), 30-67. https://doi.org/10.1177/07356331221100556
  18. Mayer, R. E., & DaPra, C. S. (2012). An embodiment effect in computer-based learning with animated pedagogical agents. Journal of Experimental Psychology: Applied, 18(3), 239–252. https://doi.org/10.1037/a0028616
  19. Fiorini L, D'Onofrio G, Sorrentino A, Cornacchia Loizzo F, Russo S, Ciccone F, Giuliani F, Sancarlo D, Cavallo F, The Role of Coherent Robot Behavior and Embodiment in Emotion Perception and Recognition During Human-Robot Interaction: Experimental Study, JMIR Hum Factors 2024;11:e45494, URL: https://humanfactors.jmir.org/2024/1/e45494, DOI: 10.2196/45494
  20. Dennler, N.S., Nikolaidis, S., & Matari'c, M. (2024). Singing the Body Electric: The Impact of Robot Embodiment on User Expectations. ArXiv, abs/2401.06977.
  21. Ouyang, F., Xu, W. The effects of educational robotics in STEM education: a multilevel meta-analysis. IJ STEM Ed 11, 7 (2024). https://doi.org/10.1186/s40594-024-00469-4
  22. Jump up to: 22.0 22.1 22.2 Van Den Berghe, R., Verhagen, J., Oudgenoeg-Paz, O., Van Der Ven, S., & Leseman, P. (2018). Social Robots for Language Learning: A Review. Review of Educational Research, 89(2), 259–295. https://doi.org/10.3102/0034654318821286
  23. Zinina, A., Kotov, A., Arinkin, N., & Zaidelman, L. (2023). Learning a foreign language vocabulary with a companion robot. Cognitive Systems Research, 77, 110–114. https://doi.org/10.1016/j.cogsys.2022.10.007
  24. Jump up to: 24.0 24.1 24.2 24.3 Deng, Q., Fu, C., Ban, M., & Iio, T. (2024). A systematic review on robot-assisted language learning for adults. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1471370
  25. Macedonia, Manuela & Mueller, Karsten & Friederici, Angela. (2010). Neural Correlates of High Performance in Foreign Language Vocabulary Learning. Mind, Brain, and Education. 4. 125 - 134. 10.1111/j.1751-228X.2010.01091.x.
  26. Misty II | Misty Robotics. (z.d.). Misty Robotics. https://www.mistyrobotics.com/misty-ii
  27. Paas, F. G. W. C. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of Educational Psychology, 84(4), 429–434. https://doi.org/10.1037/0022-0663.84.4.429
  28. Chen, N., Liu, X., Zhai, Y. et al. Development and validation of a robot social presence measurement dimension scale. Sci Rep 13, 2911 (2023). https://doi.org/10.1038/s41598-023-28817-4
  29. Bartneck, C., Kulić, D., Croft, E. et al. Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. Int J of Soc Robotics 1, 71–81 (2009). https://doi.org/10.1007/s12369-008-0001-3
  30. C. M. Carpinella, A. B. Wyman, M. A. Perez and S. J. Stroessner, "The Robotic Social Attributes Scale (RoSAS): Development and Validation," 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI, Vienna, Austria, 2017, pp. 254-262.
  31. Scotty D. Craig, Noah L. Schroeder, Reconsidering the voice effect when learning from a virtual human, Computers & Education, Volume 114, 2017, Pages 193-205, ISSN 0360-1315, https://doi.org/10.1016/j.compedu.2017.07.003
  32. Tu, C. H., & McIsaac, M. (2002). The Relationship of Social Presence and Interaction in Online Classes. American Journal of Distance Education, 16(3), 131–150. https://doi.org/10.1207/S15389286AJDE1603_2