Latest revision as of 01:54, 11 April 2025

Group Members

Name	Student number	Study	E-mail
Andreas Sinharoy	1804987	Computer Science and Engineering	a.sinharoy@student.tue.nl
Luis Fernandez Gu	1804189	Computer Science and Engineering	l.fernandez.gu@student.tue.nl
Alex Gavriliu	1785060	Computer Science and Engineering	a.m.gavriliu@student.tue.nl
Theophile Guillet	1787039	Computer Science and Engineering	t.p.p.m.guillet@student.tue.nl
Petar Rustić	1747924	Applied Physics	p.rustic@student.tue.nl
Floris Bruin	1849662	Computer Science and Engineering	f.bruin@student.tue.nl

Introduction

Problem Statement

Diagnosing speech impairments in children is a complex and evolving area within modern healthcare. While early diagnosis is essential for effective treatment, current diagnostic practices can present challenges that may impact both accuracy and accessibility. Several studies and practitioner reports have highlighted inefficiencies in existing assessment models, especially for young children (e.g., ASHA, 2024^[1], leader.pubs.asha.org^[2]).

Traditionally, speech and language therapists (SLTs) rely on structured, in-person testing methods. These assessments often involve up to three hours of standardized questioning, which can be mentally exhausting for both the child and the therapist. Extended sessions may introduce human fatigue and subjectivity, increasing the risk of inconsistencies in analysis. Additionally, children—especially those between the ages of 3 and 5—may struggle to remain focused and responsive during lengthy appointments, which can affect the quality and reliability of their responses.

Another significant consideration is the environment in which assessments take place. Therapist offices can feel unfamiliar or intimidating to young children, potentially affecting their comfort level and behavior during testing. The child’s perception of the therapist, the formality of the setting, and the pressure of the situation can all influence the outcome of the diagnostic process. Moreover, both therapist and child performance may decline over time, introducing bias and reducing the diagnostic clarity.

To address these challenges, this project proposes the use of an interactive, speech-focused diagnostic robot designed to support and enhance speech impairment assessments. This tool aims not to replace therapists but to assist them by mitigating known sources of bias and fatigue while making the process more accessible and engaging for children.

Firstly, it questions these children consistently, not suffering from its questioning decreasing in quality as the tests progress. Secondly, it records the child's responses, and provides an adequate level of preprocessing to allow the results of the test to be easily and conveniently analyzed by the therapist, allowing the therapist to be able to listen to the recording multiple times, as well as when they have the energy and state of mind to conveniently do so. Finally, it partitions the exam into smaller, gamified chunks - while still ensuring the test provides the same effectiveness as a full three hour session - which offers the children breaks and a more adequate interface which can improve the quality of their responses. Also its friendly appearing, non-threatening interface may provide the child with a more comfortable and engaging exam experience - although this will need to be tested empirically to analyze its effectiveness.

Other benefits such a robot provides include a separation between the therapist's office and the child's location. This allows for diagnoses to be taken in locations where access to therapists is limited or nonexistent as the responses from the robot can be sent digitally to therapists abroad. It also allows for a diagnosis from multiple therapists, as with consent from the child's guardians, multiple therapists can have access to the recordings and results from the tests allowing for a more robust and certified diagnosis from the therapists as well as reducing bias present in any one therapist.

Objectives

Create a robot which can ask questions, record answers, provide some basic preprocessing to answers, and possess suitable paths of executions for unexpected use cases - for example children crying, incoherent responses, etc.
Perform multiple tests to understand the effectiveness of the robot. The first being a test analyzing how well a child interacts with the robot. The second test being a comparison between an analysis from a therapist of the results from the device and the standard results from a current test.
Test how well the robot performs when encountered with unexpected use cases.

USE Analysis

Users

Speech-Language Therapist (Primary User)

Clinicians that are dedicated professionals responsible for diagnosing and treating children with speech, language, and communication challenges. They are often under high workloads and constant pressure to deliver accurate assessments quickly.

Needs:

Efficient Diagnostic Processes: Therapists require engaging diagnostic methods that transform lengthy traditional assessments into shorter, gamified sessions to minimize child fatigue and cognitive overload (Sievertsen et al., 2016^[3]). Utilizing robots to conduct playful tasks significantly enhances child engagement, allowing therapists to better manage their time and increase diagnostic accuracy (Estévez et al., 2021^[4]). Interviews with the primary user group suggest that the robot can specifically target repetitive "production constraint" tasks, reducing the number of repetitions needed, which in turn reduces session time and patient fatigue.
Comprehensive Data Capture and Objective Analysis: Therapists identify data analysis as the most time-consuming aspect of diagnostics. Interviews indicate that high-quality audio and interaction data captured during robot-assisted sessions enable therapists to conduct thorough asynchronous reviews, significantly streamlining the analysis process. The robot’s ability to provide unbiased assessments, free from subjective influences such as therapist fatigue or emotional state, improves diagnostic reliability (Moran & Tai, 2001^[5]).
Intuitive Digital Tools: Therapists benefit from secure, well-designed dashboards that streamline patient management, enable detailed annotations, and support remote, asynchronous assessments. A case study (Kushniruk & Borycki, 2006^[6]) in the Journal of Biomedical Informatics shows that intuitive clinical interfaces significantly improve diagnostic accuracy and reduce clinician errors by minimizing cognitive load and decision fatigue in health technology systems. When paired with privacy-by-design principles that align with legal frameworks like GDPR, these tools not only enhance usability but also maintain strict data protection standards.

Child Patient (Secondary User)

Young children aged 5–10, who are the focus of speech assessments, often experience anxiety, discomfort or boredom in traditional clinical environments. Their engagement in therapy sessions is crucial for accurate assessment and treatment outcomes.

This age range has been identified as the most suitable for early diagnosis due to several factors. By the age of 4 to 5, most children can produce nearly all speech sounds correctly, making it easier to identify atypical patterns or delays (ASHA^[7], 2024; RCSLT, UK^[8]). Children within this range also possess a higher level of cognitive and behavioral development, which allows them to better understand and follow test instructions—critical for accurate assessments.

Early identification is crucial because speech sound disorders (SSDs) often begin to emerge at this stage. Addressing these issues early greatly improves outcomes in speech development, literacy, and overall academic performance (NHS, 2023^[9]; European Journal of Pediatrics, 2013^[10]).

While some speech impairments are not apparent until after age 5—such as developmental language disorders or fluency issues—the younger demographic (ages 3–5) poses a unique challenge. Children aged 3–5 often have limited attention spans (6–12 minutes) and are in Piaget’s preoperational stage of development, making it essential for diagnostic tools to be simple, engaging, and developmentally appropriate to ensure accurate assessment (CNLD.org^[11]). Children in this group are often less willing or able to engage with structured diagnostic tasks due to limited attention spans and mental maturity. Therefore, it is essential that diagnostic tools are designed with this in mind, using engaging, age-appropriate methods that ensure accurate and efficient assessment.

Needs:

Interactive, Game-Like Experience: Gamification significantly boosts child engagement and motivation, reducing anxiety and fatigue (Zakrzewski, 2024^[12]). Children interacting with socially assistive robots show higher attention spans, improved participation, and reduced stress compared to traditional assessments (Shafiei et al., 2023^[13]). Interviews with therapists indicate that children often find traditional diagnostics "long," "tiring," and "laborious," underscoring the need for more engaging, playful methodologies.

Immediate, Clear Feedback: Immediate visual and auditory feedback from robots guides children effectively, providing reinforcement and maintaining high engagement levels (Grossinho, 2017^[14]). According to user interviews, such real-time feedback helps mitigate frustration during challenging speech tasks, supporting sustained engagement and more reliable outcomes.

Parent/Caregiver (Support User)

Parents or caregivers play an essential role in supporting the child’s therapy and need to feel confident that the process is both secure and effective. To ensure the therapy is reinforced outside the clinical setting, the parents need to be fully on-board.

Needs:

Data Security and Transparency: Caregivers consistently express the need for robust privacy protections. Interviews and research indicate that many parents hesitate to adopt digital tools unless they are confident that their child’s recordings and personal data are securely encrypted, stored in compliance with healthcare regulations (e.g., HIPAA/GDPR), and used solely for clinical benefit. Transparent consent mechanisms and clear explanations of data use build trust and increase engagement (Houser, Flite, & Foster, 2023^[15]). Parents want assurance that they retain control over data visibility, which reinforces their role as advocates for their child’s safety.

Progress Monitoring: Parents expect to stay informed about their child’s development. A user-friendly caregiver dashboard or app interface should display digestible progress summaries, such as completed tasks, improvements in specific sounds, and visual/audio comparisons over time. This empowers parents to understand their child’s journey and reinforces their sense of involvement. Caregiver interviews showed that getting clear updates after each session, e.g. “Jonas practiced 20 words and improved on ‘s’ sounds”, helped parents feel more motivated and involved. This kind of feedback makes it easier for parents to support practice at home and builds a stronger connection between families and therapists (Roulstone et al., 2013^[16]; Passalacqua & Perlmutter, 2022^[17]).

Personas

Personaboard, Secondary User

Persona 1: Parent

Rio Morales is a 36-year-old Dominican mom, market analyst, and creative spirit living in San Diego with her husband Jeff and their 4-year-old son, Miles. Kind, curious, and deeply supportive, Rio is navigating the early stages of her son’s speech therapy journey. While she’s highly motivated to help, she often feels overwhelmed by complex medical jargon and unsure if she’s “doing it right” at home.

She thrives on emotional connection and values intuitive, playful approaches over rigid, clinical methods. Rio’s goal is to turn speech therapy into something that feels like quality time, not homework. With strong emotional intelligence, a knack for design and analysis, and a love for hands-on learning, Rio is looking for tools that fit seamlessly into family life and empower her to support Miles with confidence and ease.

Personaboard, Secondary User

Persona 2: Child

Miles Morales is a bright, energetic 4-year-old with a big imagination and a love for dinosaurs, plush toys, and silly sounds. Recently enrolled in preschool, Miles is navigating early challenges with speech and communication. While he’s playful and emotionally intuitive, he can get easily distracted, especially by anything more serious or structured. He tends to avoid speaking when unsure, but thrives when communication is fun, safe, and full of encouragement.

Miles connects deeply with his toys, especially ones that “talk,” and learns best through imaginative play and sound imitation. He responds well to gentle guidance and praise, and needs speech support to feel more like play than practice. With the right support, especially one that taps into his world of fun and fantasy, Miles has everything he needs to grow more confident with his words.

Personaboard, Primary User

Persona 3: speech therapist

Dr. Maya Chen is a seasoned pediatric speech-language pathologist with over 15 years of experience, based in Seattle. Analytical, practical, and deeply empathetic, Maya blends clinical precision with a love for play-based learning. She specializes in early intervention and thrives when tools are both evidence-based and engaging for kids.

Her biggest frustration is when therapy tools lack functionality or confuse parents rather than empower them. Maya seeks to bridge that gap, helping families feel confident at home while ensuring progress stays on track. She values adaptable resources that align with developmental stages and support her structured-yet-playful therapy style. With expertise in diagnostics, AAC systems, and parent coaching, Dr. Chen is always on the lookout for tools that are not only fun for kids but meaningful and effective in therapy.

Scenarios

Scenario 1:

Rio has just come home after picking up Miles from school. Knowing that it's important to continue gathering diagnostic data for his speech therapy, she pulls out Rene, the plush elephant, and switches him on. Rene immediately comes to life with a cheerful sparkle in his voice, greeting Miles in a whimsical tone: “Hey there, Miles! Would you like to have a little chat with me today?” Miles lights up with a smile and, remembering what to do, squeezes Rene’s left paw to begin. Rene’s soft glow pulses as he starts asking questions, each one phrased in a playful and inviting way to make the experience feel like a fun game rather than a test.

Miles follows Rene’s instructions eagerly at first. When asked, “What sound does a dog make?” he responds with a confident “Woof!” pressing Rene’s trunk to record his answer. Rene reacts with delight, “Great job, Miles!” and seamlessly moves to the next question. The session continues smoothly for the first few minutes. Each of Miles’ responses is neatly timestamped and logged as Rene records when prompted by the trunk being pressed. But soon, Rene asks a question that catches Miles off guard. Miles hesitates, then quietly presses the trunk without saying anything, producing a barely-there 1 millisecond audio clip. Rene, unfazed, still registers it as a response and moves on without drawing attention to the missed moment.

After that, it becomes clear that Miles is losing interest. His answers become silly—giggling, making random noises, and throwing in nonsense words. Still, Rene patiently continues, recording each one and storing them just like any other response. Eventually, Miles gives Rene’s right paw a squeeze, signalling that he’s done for the day. Rene instantly switches tone, saying goodbye in a warm voice: “That was so much fun, Miles! Let’s talk again another time, okay?”

As soon as the session ends, Rene automatically begins the upload process. All the audio recordings—whether clear, silly, or silent—are sent securely to Miles’ encrypted profile in the therapy database. Once received, the system begins analysing the data, scanning each response for clarity, duration, and consistency. It flags any problematic or ambiguous responses, including the one-millisecond silent clip and the burst of unrelated sounds toward the end. These are added to a follow-up list that Rene will be programmed to revisit in a future session, adapting his script accordingly to ensure each critical diagnostic point is eventually covered.

At the same time, the therapist’s companion app receives a quiet notification: new recordings are available. When she logs in, she’ll see exactly which questions were answered, which ones need clarification, and can even listen to the audio if needed. From this playful interaction between a child and a soft toy, meaningful diagnostic data has been collected—seamlessly, naturally, and without stress.

Story board for scenario

Requirements

For the Therapist

Robust Hardware Integration: The system must incorporate reliable and durable hardware to ensure diagnostic sessions are completed without data loss or interruption. The design should aim to minimise technical failures during assessments, ensuring that every session's data remains intact and can be reviewed later.
User-Friendly Dashboard: An intuitive and efficient digital dashboard is required to present clinicians with information on each recorded session. The dashboard should facilitate rapid review and analysis, with the goal of enabling therapists to quickly identify patterns or issues in speech. By streamlining navigation and data review, the tool should help therapists manage multiple patients efficiently while maintaining high diagnostic accuracy.
Secure Remote Accessibility: In today’s increasingly digital healthcare environment, therapists must be able to access patient data remotely. The system must employ state-of-the-art encryption and robust user authentication protocols to protect sensitive patient data from unauthorised access. Having robust security is crucial for both clinicians and patients, as it reassures all parties that the integrity and confidentiality of the clinical data are maintained as per today’s standards.

For the Child and Parent

Engaging and Comfortable Design: For young children, the physical design of the robot plays a crucial role in therapy success. A soft, plush exterior coupled with interactive buttons and LED feedback systems can create a friendly, non-intimidating interface that reduces fosters a positive human robot interaction. Ultimately, the sessions should be a fun experience for the child, as otherwise no progress would be made and no speech data would be collected.
Responsive Feedback Systems: Dynamic auditory and visual cues are essential components that should guide children through each step of a session. Real-time feedback, such as flashing LEDs synchronised with encouraging sound effects, aims to help the child understand when they are performing correctly, and gently corrects mistakes when necessary. This immediate reinforcement not only keeps the child engaged and motivated but also provides parents with clear, observable evidence of their child’s progress. In essence, cues ensure that the therapy sessions are both interactive and instructive.
Robust Data Security: The system must implement comprehensive security measures such as end-to-end encryption and secure storage protocols to prevent unauthorised access or data breaches. The level of protection must reassure both parents and therapists that the child’s data is handled with the highest level of care and confidentiality. Adhering strictly to healthcare regulations is essential to maintain trust and protect privacy throughout the therapy process.

Society

Early, engaging diagnostic and intervention tools, such as a speech companion robot, offer substantial benefits for children with communication impairments and society at large. Timely identification and therapy in the preschool years can prevent persistent academic, communication, and social difficulties that often emerge when early intervention is missed, with broad effects on educational attainment and life opportunities (Hitchcock, E.R., et al., 2015^[18]). Research further shows that developmental language disorders, if not addressed, can impede literacy and learning across all curriculum areas (Ziegenfusz, S., et al., 2022^[19]). By delivering engaging, play-based practice at an early age, a robot helps close this gap, motivating children through game-like exercises and positive reinforcement, which in turn promotes consistent practice and faster skill gains. Significantly, trials of socially assistive robots in speech therapy report notable improvements in children’s linguistic skills, as therapists observe that robots keep young learners more engaged and positive during sessions (Spitale et al., 2023^[20]).

Beyond facilitating better speech outcomes, early intervention promotes social inclusion. Communication ability is a key factor in social development, and deficits often undermine peer interaction and participation into adulthood^[18]. Even mild speech impairments can result in social withdrawal or negative peer perceptions, affecting confidence and classroom integration. By improving intelligibility and expressive skills in the preschool years, early intervention equips children to engage more fully with teachers and classmates, fostering an inclusive learning environment. Enhancing communication skills by age 5 has been linked to better social relationships and academic performance^[19]. A therapy robot that makes speech practice fun and accessible therefore not only accelerates skill development but also strengthens a child’s long-term social and educational trajectory.

Furthermore, digitally enabled tools can dramatically increase access to speech-language services and mitigate regional disparities in care. Demand for speech therapy outstrips supply in many European countries: in England, for example, over 75,000 children were on waitlists in 2024, with many facing delays of a year or longer (The Independent, 2024^[21]), while in Ireland some families waited over two years or even missed the critical early intervention window entirely (Sensational Kids, 2024^[22]). A digital companion robot can alleviate these bottlenecks by providing preliminary assessments and guided practice without requiring a specialist on-site for every session. Telehealth findings during the COVID-19 pandemic confirm that remote services can effectively deliver therapy and improve access, allowing children in rural or underserved areas to begin articulation work or screening immediately, with therapists supervising progress asynchronously. This approach lowers wait times, ensures earlier intervention, and reduces diagnostic disparities across Europe. Moreover, as the EU moves toward interoperable digital health networks (Siderius, L., et al., 2023^[23]), data and expertise can be shared seamlessly across borders, enabling a harmonized standard of care. An EU-wide rollout of speech companion robots could thus accelerate early intervention for all children, regardless of geography, and foster continual refinement of therapy protocols through aggregated insights.

Enterprise

Implementing a plush robotic therapy assistant could reduce long-term costs for clinics and schools by supplementing human therapists and delegating their workloads to the robotic assistants. Despite the high initial investment, the robot's ability to handle repetitive practice drills and engage multiple children simultaneously could significantly increase therapists' efficiency. Many children with speech impairments require extensive therapy, approximately 5% of children have speech sound disorders needing intervention (CDC, 2015^[24]). Robotic aides that accelerate progress and facilitate frequent practice could substantially reduce total therapy hours required, enabling therapists to manage higher caseloads and allocate more time to complex, personalized interventions.

The market for therapeutic speech robots spans both healthcare and education sectors. For example, in the U.S. alone, nearly 8% of children aged 3–17 have communication disorders, with speech issues being most prevalent^[24]. This demographic represents millions of children domestically and globally. Clinics, educational institutions, and private practices serving these children constitute a significant market. Additional market potential exists among children with autism spectrum disorders or developmental language delays, further expanding the reach of such technology. Increasing awareness and investments in inclusive education and early interventions suggest robust market growth potential if therapeutic robots demonstrate clear efficacy and user acceptance.

Robotic solutions offer substantial scalability. After development and validation, plush therapeutic robots could be widely implemented across classrooms, speech therapy clinics, hospitals and homes. Centralized software updates could uniformly introduce new therapy activities or languages, maintaining consistent therapy quality. Robots could also enable group sessions, increasing therapy accessibility and efficiency. Remote operation capabilities could further extend services to underserved regions lacking on-site specialists, thereby addressing workforce shortages and regional disparities in speech therapy services.

Adoption barriers include initial cost, potential skepticism from traditional practitioners, and concerns from parents regarding effectivenes. Training requirements, technological reliability, privacy, data security issues, and regulatory approval processes also pose significant challenges. Demonstrating robust clinical efficacy through research is essential to encourage cautious institutions to adopt this technology. Addressing privacy concerns through secure data handling practices and obtaining necessary regulatory approvals will be critical in mitigating these barriers. Successful integration requires plush robots to complement existing therapeutic infrastructures as supportive tools rather than replacements. Robots could guide structured therapy exercises prepared by therapists, allowing therapist intervention as needed. Data collected, such as task performance, pronunciation accuracy, and engagement metrics, could seamlessly integrate into electronic health records and individualized education plans. Reliable maintenance and robust technical support from institutional IT teams or vendors are necessary to sustain long-term functionality. Effective integration ensures robots become routine tools in speech therapy, similar to interactive software and educational materials, aligning well with evolving EU digital health strategies and facilitating standardized practices across institutions (European Comission, eHealth).

State of the Art

Literature study

Traditional pediatric speech assessments often require hour-long in-person sessions, which can exhaust the child and even the clinician. Research shows that young children’s performance suffers as cognitive fatigue sets in: after long periods of sustained listening or test-taking, children exhibit more lapses in attention, slower responses, and declining accuracy (Key et al., 2017^[25]; Sievertsen et al., 2016^[3]). Preschool-aged children (≈3–5 years old) are especially prone to losing focus during extended diagnostics. In fact, to accommodate their limited attention spans, researchers frequently shorten and simplify tasks for this age group (Finneran et al., 2009^[26]). Even brief extensions in task length can markedly worsen a young child’s performance; for example, one study found that increasing a continuous attention task from 6 to 9 minutes led 4–5 year olds with attentional difficulties to make significantly more errors (Mariani & Barkley, 1997^[27]). Together, these findings suggest that lengthy, single-session evaluations may introduce cognitive fatigue and inconsistency into the results, i.e. a child tested at the beginning of a multi-hour session may perform very differently than at the end, simply due to waning concentration and energy. Further supporting this concern, studies on pediatric neuropsychological testing have emphasized the need for flexible, modular test to prevent fatigue-related bias. For instance, studies (Plante et al., 2019^[28]) highlight how short therapy sessions can be effective. This is because attention and memory performance in young children are heavily influenced by to time-on-task effects, recommending shorter tasks with breaks in between.

Another challenge is that the clinical setting itself can skew a child’s behavior and communication, potentially affecting diagnostic outcomes. Children are often taken out of familiar environments and placed in a sterile clinic or hospital room for these assessments. This change can lead to atypical behavior: some kids become anxious or reserved, while others might act out due to discomfort. Notably, children’s speech and language abilities observed in a clinic may not reflect their true skills in a natural environment. In a classic study, Scott and Taylor (1978^[29]) found that preschool children produced significantly longer and more complex utterances at home with a parent than they did in a clinic with an unfamiliar examiner. The home setting elicited richer language (more past-tense verbs, longer sentences, etc.), whereas the clinic samples were shorter and simpler^[29]. Similarly, Hauge et al. (2023^[30]) stress that conventional testing environments, often unfamiliar and rigid, can increase stress and distractibility in children, adding to fatigue-related performance issues. This suggests that standard clinical evaluations could underestimate a child’s capabilities if the child is uneasy or not fully engaged in that setting. Factors like unfamiliar adults, strange equipment, or the feeling of being “tested” can all negatively impact a young child’s behavior during assessment. Thus, there is a clear motivation to explore assessment methods that make children feel more comfortable and interested, in order to obtain more consistent and representative results.

To address these issues, some recent work has turned to socially assistive robots (SARs) and gamified interactive tools as innovative means of conducting speech-language assessment and therapy. The underlying idea is that a friendly robot or game can transform a tedious evaluation into a fun, engaging interaction. There is growing evidence that such approaches can indeed improve children’s engagement, yield more consistent participation, and broaden accessibility. For instance, game-based learning techniques have been shown to boost motivation and focus: multiple studies (including meta-analyses) conclude that using “serious games” or gamified tasks significantly increases students’ engagement and learning in therapeutic or educational contexts (Brackenbury & Kopf, 2022^[31]). In speech-language therapy, this means a child might persevere longer and respond more enthusiastically when the session feels like play rather than an exam. Social robots, similarly, can serve as charismatic partners that hold a child’s attention. In one 8-week intervention study, a socially assistive humanoid robot was used to help deliver language therapy to children with speech impairments (Spitale et al., 2023^[20]). The researchers found significant improvements in the children’s linguistic skills over the program, comparable to those made via traditional therapy. Importantly, children working with the physical robot displayed greater engagement; measured by eye gaze, attention, and number of vocal responses – than those who received the same therapy through a screen-based virtual agent^[20]. In the robot-assisted sessions, children not only spoke more, but also stayed motivated and positive throughout, as the robot’s interactive games and feedback kept them interested. Therapists initially approached the technology with some scepticism, but after hands-on experience they reported that the robot helped maintain the children’s focus and could be a useful tool for keeping young clients motivated^[20]. Such findings align with a broader trend in pediatric care; socially assistive robots and interactive platforms can mitigate boredom and fatigue by making therapy more engaging, all while delivering structured practice. Another advantage is consistency and unlike human evaluators who might inadvertently vary their prompts or become tired, a robot can present therapy exercises in a highly standardized way each time. This consistency, paired with the robot’s playful demeanour, can lead to more reliable assessments and enjoyable therapy sessions that children look forward to rather than dread.

In addition to robot-assisted solutions, researchers are also exploring telehealth and asynchronous models to improve accessibility and flexibility of speech-language services. Telehealth (remote therapy via video call) saw a huge expansion during the COVID-19 pandemic, but its utility extends beyond that context. Studies have demonstrated that tele-speech therapy can be highly effective for children, often yielding progress equivalent to in-person therapy while removing geographic and scheduling barriers (Fekar Gharamaleki & Nazari, 2024^[32]). By delivering evaluation and treatment over a secure video link, clinicians can reach families who live far from specialists or who have difficulty traveling to appointments. Parents and children have reported high satisfaction with teletherapy, citing benefits such as conducting sessions in the child’s own home (where they may be more comfortable and attentive) and easier scheduling for busy families. Beyond live video sessions, asynchronous approaches are being tried as well. In an asynchronous model, a clinician might prepare therapy activities or prompts in advance for the family to use at their convenience – for example, a tablet-based app that records the child’s speech attempts, which the clinician reviews later. Research (Vaezipour et al., 2020^[33]) is being done in this field, looking at what kind of benefits this can have. This “store-and-forward” technique allows therapy to happen in short bursts when the child is most receptive, rather than insisting on a fixed appointment time. Hill and Breslin (2016^[34]) describe a platform where speech-language pathologists could upload tailored exercises to a mobile app; children would complete the exercises (with the app capturing their responses) at home, and the therapist would asynchronously monitor the results and update tasks as needed. Such a system removes the need to coordinate schedules in real time and can reduce the burden of 2-hour sessions by spreading practice out in more frequent, bite-sized interactions. Early evaluations of these models indicate they can improve efficiency without compromising outcomes, though they require careful planning and reliable technology^[34]. AI-driven platforms, in particular, can adapt to a child’s responses and maintain motivation through personalized challenges and praise, making them especially suitable for preschoolers with limited attention spans (Utepbayeva et al., 2022^[35]) (Bahrdwaj et al., 2024^[36]). Overall, the rise of telehealth in speech-language pathology shows promise for making assessment and therapy more flexible, family-centered, and accessible, especially for young children who may do better with multiple shorter interactions than a marathon clinic visit. It also means clinicians can observe children in naturalistic home settings via video, potentially gaining a more ecologically valid picture of the child’s communication skills than a clinic observation would provide.

Looking ahead, the integration of wearable sensors, voice-activated assistants, and augmented reality (AR) platforms may become options available for pediatric speech-language therapy. These innovations offer the potential to track subtle speech metrics throughout the day, deliver immersive learning experiences, and provide real-time coaching in environments more natural for the child (Wandi et al., 2023). For instance, smart microphones embedded in toys or home devices could collect speech samples during play, giving clinicians richer data without requiring formal testing sessions. Meanwhile, AR systems might enable children to interact with animated characters that prompt speech practice through storytelling or collaborative games, making therapy feel more like an adventure than a clinical task. As these technologies develop, they could improve both assessment accuracy and child engagement, especially when combined with clinician oversight and family participation.

Finally, any move toward robot-assisted or digital speech therapy must be accompanied by rigorous attention to privacy, consent, and ethics. Working with children introduces critical ethical responsibilities. Informed consent in pediatric settings actually involves parental consent (and when possible, the child’s assent); parents or guardians need to fully understand and agree to how an AI or robot will interact with their child and what data will be collected. Developers of such tools are urged to adopt a "privacy-by-design" approach, ensuring that any audio/video recordings or personal data from children are securely stored and used only for their intended therapeutic purposes (Lutz et al., 2019^[37]). Privacy considerations for socially assistive robots go beyond just data encryption; for example, a robot equipped with cameras and microphones could intrude on a family’s privacy if it is always observing, so clear limits and transparency about when it is “recording” are essential^[37]. Ethical guidelines also emphasize that children’s participation should be voluntary and respectful: a child should never be forced or deceived by an AI system. One recent field study noted concerns about trust and deception when young children interact with social robots: kids might innocently over-trust a robot’s instructions or grow distressed if the robot malfunctions or behaves unpredictably (Singh et al., 2022^[38]). To address this, designers strive to make robots safe and reliable, and clinicians are careful to explain the robot’s role (for instance, framing it as a helper or toy, not as an all-knowing authority). It’s also important to maintain human oversight: the goal is to assist, not replace, the speech-language pathologist. Ethically designed systems will flag any serious issues to a human clinician and allow parents to opt out at any time. In summary, child-centric AI and robotics projects must prioritize the child’s well-being, autonomy, and rights at every stage. This involves obtaining proper consent, safeguarding sensitive information, and transparently integrating technology in a way that complements traditional care. By doing so, innovative speech diagnostic and therapy tools can be both cutting-edge and responsible, ultimately delivering engaging, consistent, and accessible support for children’s communication development without compromising ethics or privacy.

Existing robots

To understand how we can create the best robot for our users, we have to look at what robots already exists relating to our project. We analyzed the following robots and related them to how we can use them for our robot.

RASA robot

The RASA (Robotic Assistant for Speech Assessment) robot is a socially assistive robot developed to enhance speech therapy sessions for children with language disorders. The robot is used during speech therapy sessions for children with language disorders. The robot uses facial expressions that make therapy sessions more engaging for the children. The robot also uses a camera that uses facial expression recognition with convolutional neural networks to detect the way the children are speaking. This helps the therapist in improving the child's speech. Studies have shown that incorporating the RASA robot into therapy sessions increases children's engagement and improves language development outcomes.

The RASA robot

Automatic Speech Recognition

Recent advancements in Automatic speech recognition (ASR) technology have led to systems capable of analyzing children's speech to detect pronunciation issues. For instance, a study fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced by children with speech sound disorders, achieving approximately 90% accuracy in matching human annotations. ASR technology streamlines the diagnostic process for clinicians, saving time in the diagnosing process.

Nao robot

Developed by Aldebaran Robotics, the Nao robot is a programmable humanoid robot widely used in educational and therapeutic settings. Its advanced speech recognition and production capabilities make it a valuable tool in assisting speech therapy for children, helping to identify and correct speech impediments through interactive sessions.

The Nao robot

Kaspar robot

Kaspar is a socially assistive robot whose purpose is to help children with autism learn social and communication skills. A child-centric appearance and expressive behavior are given prominence in the design in order to invite users to engage in interactive activity. Studies ^[39] have indicated that children working with Kaspar show improved social responsiveness, and the same design principles could be applied to enhancing speech therapy outcomes.

Kaspar the social robot, University of Hertfordshire

Requirements

The aim of this project is to develop a speech therapy plush robot specifically designed to address issues in misdiagnoses of speech impediments in children. Our plush robot solution addresses these issues by transforming lengthy, traditional speech assessments into engaging, short, interactive sessions that are more enjoyable and less exhausting for both patients (children aged 5-10) and therapists. By providing an interactive plush toy equipped with audio recording, playback capabilities, local data storage, intuitive user interaction via buttons and LEDs, and secure data transfer, we aim to significantly reduce the strain on SLT resources and enhance diagnostic accuracy and patient engagement.

To be able to achieve all of this we must satisfy a certain set of requirements that will outline the requirements, design considerations, user interactions, data handling, performance goals, and a test plan, all of which will be done following a MOSCOW prioritization method:

MOSCOW Prioritization Model
Category	Definition
Must Have	Essential for core functionality; mandatory for the product's success.
Should Have	Important but not essential; enhances usability significantly.
Could Have	Beneficial enhancements that can be postponed if necessary.
Won’t Have	Explicitly excluded from current scope and version.

Design Specifications

Physical and Interaction Design Requirements
Ref	Requirement	Priority
DS1	Plush toy casing made from child-safe, non-toxic materials.	Must
DS2	Secure internal mounting of microphone, speaker, battery, and processing units.	Must
DS3	Accessible button controls (Power, Next Question, Stop Recording).	Must
DS4	Visual feedback via LEDs and Movement (eg. Recording, Test Complete, Error, Low Storage, Low Battery).	Must
DS5	The robot casing should withstand minor physical impacts or drops.	Should
DS6	Device will not be waterproof.	Won’t

The plush robot’s physical design prioritizes child safety, comfort, and ease of interaction. Using child-safe materials (DS1) and securely mounted electronics (DS2) ensures safety, while intuitive button controls (DS3) and clear LED indicators (DS4) simplify usage for young children. Durability considerations (DS5) shall ensure the product withstands typical child handling (or any handling...), though waterproofing (DS6) is excluded due to practical constraints and the lack of its necessity given the setting it will be used in.

The test plan is as follows:

Test Cases Design Specifications
Ref	Precondition	Action	Expected Output
DS1	N/A	Inspect the plush toy casing and its material certificates/specifications.	Plush toy casing is certified as child-safe and non-toxic.
DS2	N/A	Inspect internal components of the plush toy.	Microphone, speaker, battery, and processing units are securely mounted internally.
DS3	Plush robot is powered on.	Press each button (Power, Next Question, Stop Recording).	Each button activates its designated functionality immediately.
DS4	Plush robot is powered on.	Observe LED and any movement indicators when performing different operations (recording, completing test, causing an error, storage nearly full, battery low).	LEDs correctly indicate each state as intended.
DS5	N/A	Simulate minor drops or impacts on a safe surface.	Plush robot casing withstands minor impacts without significant damage or loss of functionality.
DS6	N/A	N/A	N/A

Functionalities

Functional Requirements
Ref	Requirement	Priority
FR1	Provide pre-recorded prompts for speech exercises.	Must
FR2	Capture high-quality audio recordings locally in WAV format.	Must
FR3	Securely store audio locally to maintain data privacy.	Must
FR4	Enable intuitive button-based session controls.	Must
FR5	Support secure data transfer via the Internet, USB and/or Bluetooth.	Must
FR6	Implement basic noise-filtering algorithms.	Should
FR7	Automatically shut down after prolonged inactivity to conserve battery.	Should
FR8	Optional admin panel for therapists to configure exercises and session lengths.	Could
FR9	Optional voice activation for hands-free interaction.	Could
FR10	Explicitly exclude cloud storage for privacy compliance.	Won’t

These functional requirements reflect: the necessity to simplify lengthy therapy sessions into manageable segments (FR1, FR4), securely capture and store speech data for later analysis (FR2, FR3, FR5), and maintain user-friendly interaction. Privacy and data security are prioritized by excluding cloud storage (FR10), while noise filtering (FR6), auto-shutdown (FR7), admin panel (FR8), and voice activation (FR9) further enhance practical usability and efficiency but are not essential for the functionality of the robot per se.

The test plan is as follows:

Functionalities Test Plan
Ref	Precondition	Action	Expected Output
FR1	Plush robot is powered on.	Initiate a therapy session.	Robot plays clear pre-recorded speech exercise prompts correctly.
FR2	Plush robot is powered on and ready to record.	Record speech audio using provided prompts.	High-quality WAV audio recordings are stored locally.
FR3	Recording completed.	Inspect local storage on plush robot (either via ssh or usb etc.)	Audio recordings are securely stored locally.
FR4	Plush robot is powered on.	Use button controls to navigate through a session.	Session navigation (Start/Stop, Next Question) operates smoothly via buttons.
FR5	Plush robot has recorded data.	Connect robot via USB/Bluetooth to transfer data securely.	Data transfer via USB/Bluetooth completes successfully with no data corruption or leaks.
FR6	Robot is powered on and ready to record.	Record speech in a moderately noisy environment.	Recorded audio demonstrates effective noise filtering with reduced background noise.
FR7	Plush robot powered on, idle for prolonged period.	Leave robot idle for X+ minutes.	Robot automatically powers off to conserve battery.
FR8	Admin panel feature implemented.	Therapist configures new exercises/session lengths via admin panel.	Admin panel accurately saves and applies changes to exercises/session durations.
FR9	Voice activation implemented.	Use voice commands to navigate exercises.	Robot successfully responds to voice commands.
FR10	Check product documentation/design specification.	Inspect data storage and upload protocols.	Confirm explicitly stated absence of cloud storage capability.

UI/UX

User Interaction Requirements
Ref	Requirement	Priority
UI1	Clear visual indication of active recording status through LEDs.	Must
UI2	Easy navigation between prompts using physical buttons.	Must
UI3	Dedicated button to stop the session and securely store audio.	Must
UI4	Audio or Visual notifications/indications when storage or battery capacity is low.	Could
UI5	Optional voice commands to navigate exercises.	Could
UI6	Exclude advanced manual audio processing controls for simplicity.	Won’t

User interactions are designed to be intuitive, enabling children to comfortably navigate therapy sessions (UI1, UI2, UI3) without assistance. Additional audio notifications (UI4) provide helpful prompts, and potential voice command options (UI5) may further simplify operation. Advanced settings are deliberately excluded (UI6) to maintain simplicity for the primary child users (and also due to the lack of time available to work on the project).

The test plan is as follows:


Ref	Precondition	Action	Expected Output
UI1	Plush robot is powered on.	Initiate audio recording session.	LEDs clearly indicate active recording status immediately.
UI2	Session ongoing.	Press "Next" button.	Plush robot navigates to next prompt immediately and clearly.
UI3	Session ongoing, recording active.	Press dedicated "Stop" button.	Recording stops immediately and audio securely stored.
UI4	Low storage/battery conditions simulated.	Fill storage nearly full and/or drain battery low.	Robot issues clear audio or visual notifications indicating low storage or battery.
UI5	Voice commands implemented.	Navigate prompts using voice commands.	Robot accurately navigates prompts using voice interaction.
UI6	Check product documentation/design specification.	Verify available UI options.	Confirm explicitly stated absence of advanced audio processing UI controls.

Data Handling and Privacy

Data Handling and Privacy Requirements
Ref	Requirement	Priority
DH1	Safe private storage of all collected data.	Must
DH2	Encryption of stored data.	Should
DH3	Facility for deletion of data post-transfer.	Should
DH4	Optional automatic deletion feature to manage storage space.	Could
DH5	No external analytics or third-party integrations unless anonymized .	Won’t

Data privacy compliance is paramount, thus mandating the need for safe and secure storage (DH1). Encryption (DH2) and data deletion capabilities (DH3, DH4) enhance security measures, while explicitly excluding third-party integrations (DH5) aligns with data protection and privacy goals. If third-party integrations or analytics are used then it is important for the data processed by it to first be anonymized.

The test plan is as follows:

Data Handling and Privacy Test Plan
Ref	Precondition	Action	Expected Output
DH1	Robot has collected data.	Inspect robot’s storage location and methods.	Confirm data is stored locally/offline with no online/cloud-based storage.
DH2	Data encryption implemented.	Attempt accessing stored data directly without proper keys.	Data is inaccessible or unreadable without proper decryption.
DH3	Data has been transferred.	Delete data via provided facility after transfer.	Data is successfully deleted from local storage immediately after confirmation.
DH4	Data exists (old)	Fill storage and observe automatic deletion feature.	Old data automatically deletes to maintain adequate storage space.
DH5	N/A	Inspect software.	Confirm absence of external analytics and third-party integrations...

Performance

Performance Requirements
Ref	Requirement	Priority
PR1	Prompt playback latency under one second after interaction.	Must
PR2	Recording initiation latency under one second post-activation.	Must
PR3	Real-time or near-real-time audio noise filtering.	Could
PR4	Optional speech detection for audio segment trimming.	Could
PR5	Exclusion of cloud-based or GPU-intensive AI processing.	Won’t

Performance standards demand rapid, seamless interaction (PR1, PR2) to maintain user engagement and provide a good user experience, supplemented by noise filtering capabilities (PR3) to provide the therapists a good experience though it is not exactly necessary for the functioning of the robot. Advanced optional speech-processing features (PR4) are nice to haves but again not necessary, and high-resource cloud-based AI operations (PR5) are explicitly omitted to maintain simplicity and most importantly data security.

The test plan is as follows:

Performance Test Plan
Ref	Precondition	Action	Expected Output
PR1	Plush robot powered on.	Initiate speech prompt via interaction/button press.	Prompt playback latency is less than or equal to X seconds.
PR2	Plush robot powered on.	Start recording via interaction/button press.	Recording initiation latency is less than or equal to X second.
PR3	Noise filtering feature implemented.	Record audio in background-noise conditions and observe immediate playback.	Real-time or near-real-time audio noise filtering noticeably reduces background noise.
PR4	Speech detection implemented.	Record speech with silent pauses.	Audio segments are correctly trimmed to include only relevant speech portions.
PR5	N/A	Inspect software.	Confirm absence...

Application

Application Requirements
Ref	Requirement	Priority
APP1	Secure login system requiring therapist and robot passcodes.	Must
APP2	Encrypted and Authenticated data transfer over HTTPS.	Must
APP3	UI must be intuitive and allow easy navigation through audio recordings.	Must
APP4	Visual flagging of audio quality (Red, Yellow, Green, Blue).	Must
APP5	Audio playback should occur directly in-browser.	Must
APP6	Manual override and reprocessing of flagged audio, triggering resend to robot.	Must
APP7	Role-based access control to isolate data per therapist.	Must
APP8	Application should be responsive and accessible across platforms.	Must
APP9	Session tokens should persist login securely for a defined period (e.g. 1 day).	Should
APP10	Transcript generation via integrated AI model.	Should
APP11	Therapist UI should auto-update when new recordings are uploaded.	Should
APP12	Admin panel to manage course content dynamically.	Should

The therapist-facing web application is another central part of the speech diagnosis system, providing a secure, intuitive interface for reviewing and managing recordings which is what will actually enable the therapist to have a nicer experience with less bias when performing the diagnosis. To meet its functional and clinical goals, the application enforces a dual-passcode login system (APP1) that authenticates both therapist and robot identities, ensuring only authorized access. All data transmission is secured through HTTPS encryption (APP2), upholding privacy and regulatory compliance along with custom encryption to validate authenticity.

To improve therapist workflows, the UI must also be intuitive and navigable (APP3), with an in-browser audio playback system (APP5) and visual quality indicators (APP4) that help prioritize attention of specific recordings. Therapists must also be able to manually override audio flags and resend prompts to the robot (APP6) when a recording is insufficient. The app enforces strict role-based access control (APP7), ensuring data is siloed between therapists, and must be responsive across devices (APP8), supporting both desktop and tablet workflows.

Additional features enhance usability and flexibility: persistent session tokens (APP9) reduce login friction during active use, automated transcript generation (APP10) simplifies documentation, and real-time dashboard updates (APP11) help therapists stay in sync with robot-side activity. A dedicated admin panel (APP12) further supports dynamic content management, enabling continuous evolution of therapeutic content without developer intervention.

The test plan is as follows:

Application Test Plan
Ref	Precondition	Action	Expected Output
APP1	App is deployed	Attempt login with valid therapist and robot passcodes	Secure login is successful; access granted only when both passcodes are valid
APP1	App is deployed	Attempt login with one or both invalid passcodes	Access denied with clear, user-friendly error message
APP2	App is live	Monitor network activity during login and uploads	All requests are transmitted over HTTPS; no plaintext data is sent and also an authentication signature is passed to validate authenticity.
APP3	Therapist logged in	Navigate through dashboard and audio recordings	Navigation is smooth, labels are clear, and all key functions are easily discoverable
APP4	Audio files available	View audio table	Files are color-coded based on quality flags (Red, Yellow, Green, Blue)
APP5	Audio files uploaded	Click play on an audio recording	Audio plays in-browser without downloading or external tools
APP6	Poor-quality audio identified	Therapist flags it for reprocessing	Recording turns Blue; robot receives re-ask instruction during next poll
APP7	Two therapists registered	Each logs in with their own robot passcode	Each sees only their respective data; no cross-access to recordings or transcripts
APP8	Access from various devices	Open app on phone, tablet, and desktop	UI adapts properly; all functionalities remain intact
APP9	Therapist logged in	Close browser and return after several hours	If within session time, access is maintained; otherwise redirected to login
APP10	Transcript service enabled and recording exists	Click "Generate Transcript" button on a recording	Transcript appears in UI with non-verbal content filtered; therapist can edit/download
APP11	Robot uploads a new recording	Watch therapist dashboard	New recording appears in real time with correct metadata and flag
APP12	Admin user logged in	Create a new course and lesson in admin panel	Content is saved and becomes accessible to therapists on next login

Legal & Privacy Concerns

Note: We are only going to concern ourselves with EU legislation and regulations as this is our country of residence. Furthermore most of these regulations concern themselves with a full scale implementation of this robot.

We will mainly be making reference to the following regulations/Legislation:

General Data Protection Regulation GDPR (https://gdpr-info.eu/)
Medical Device Regulation MDR (https://eur-lex.europa.eu/eli/reg/2017/745/oj/eng)
AI act (https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689)
UN Convention on the Rights of the Child
AI Ethics Guidelines (EU & UNESCO)
Product Liability Directive (EU 85/374/EEC)
ISO 13482 (Safety for Personal Care Robots)
EN 71 (EU Toy Safety Standard)

Data Collection & Storage

The robot we want to build for this project requires that some specific audio snippets and data to be collected and stored somewhere where, therapists and professionals that are responsible for the patient's care can access it. This data however is sensitive and must be secured and protected so it is only accessible to those who are permitted to access it. We should also focus on storing the minimum required amount of data on the patient using the robot to make sure only necessary data is stored. These specific data collection and storage concerns, in the EU, are outlined in Articles 5 and 9 of the GDPR.

in this context this means the data collected by the robot should at most include:

Speech audio data of the patient needed by the therapist to help treat the patients impediment
minimal identification data to know which patient has what data.
Other data may be needed but must specifically be argued (subject to change)

Furthermore all the data collected by the robot must be:

encrypted, so if somehow stolen cannot be interpreted
securely stored, so it can be accessed by the relevant permitted parties

In addition to the basic principles of data minimization and secure local storage, Article 25 of the GDPR mandates "privacy by design and by default". This means the architecture of the robot must enforce strict defaults: storage should be disabled until parental consent is granted, recordings should be timestamped and logged with audit trails, and access should be time-limited and traceable. Furthermore, for deployments in schools or healthcare settings, the system must be able to integrate with institutional data protection policies and local data controllers.

User Privacy & Consent

In order for the robot to be used and for data to collected and shared with the relevant parties, the patient user must consent to this and they must also hold specific rights over the data (creation, deletion, restriction etc). On top of this depending on the age of the patient certain restrictions must be placed on the way data is shared, and all patients must have a way to opt-out and withdraw consent from data collection if necessary. These are all covered in Articles 6, 7, 8 of the GDPR.

In essence this means the user must have the most power and control over the data collected by the robot, and the data collected and its use must be made explicitly clear to the user to make sure that its function is legal and ethical.

Under Article 8 of the GDPR, children under the age of digital consent require explicit parental permission to engage with services that collect personal data. In practice, this means the robot must support consent verification mechanisms, such as requiring a digital signature from the guardian or integrating a PIN-based consent process at setup. Additionally, Recital 38 of the GDPR advises that the privacy notices shown to both child and guardian be presented in age-appropriate language. For example, using pictograms or child-friendly animations to explain when the robot is recording would not only support legal compliance but also improve trust and understanding.

Security Measures

Since we must exchange sensitive data between the patient and therapist, data must be secured and protected in its transmission, storage and access. These relevant regulations are specified in Article 32 of the GDPR (Data Security Requirements).

This means that data communication must be end-to-end encrypted, and there must be secure and strong authentication protocols across the entire system. On the therapists end of things there must be relevant RBAC (role based access control) so only the relevant admins can access the data. In real time use over long periods of time there should be the possibility of software updates to improve security.

To comply with Article 32, the robot's firmware should be hardened against attacks (e.g. disabling remote debugging in production) and should support secure boot to prevent tampering. Additionally, data uploads must use TLS 1.3 or better, with server authentication validated using public key pinning. Therapist access should require MFA (multi-factor authentication), and if possible, sessions should auto-expire after inactivity. Finally, a detailed Data Protection Impact Assessment (DPIA) should be conducted prior to releasing the product to users, as required for any product handling systematic monitoring of vulnerable groups (Art. 35).

Legal Compliance & Regulations

Since this robot can be considered as a health related or medical device, we must check and make sure that the data collected is used and treated as medical data. All regulations relevant to this are specified in the Medical Device Regulation.

This Robot may also have certain AI specific features or functionalities so this must also fall within and adhere to regulations and laws present in the AI act so that the functionality and usage of the robot is ethical.

Ethical Considerations

Since the patients using this device and interacting with it are children, we must make sure that the interactions with the child are ethical and the way in which data is used and analysed in order to form a diagnosis is not biased in any sort of way.

The robot must minimize psychological risks of AI-driven diagnosis, prevent any possible distress, anxiety and deception that interaction could cause. Training assessments should be analysed in a fair and unbiased manner and decisions on treatment and required data for a particular stage of treatment should be almost entirely decided by the therapist with little to minimal AI involvement.

These are all outlined in the AI Ethics Guidelines and article 16 of the UN Convention on the Rights of the Child.

The UN Convention on the Rights of the Child Article 3 requires that all actions involving children prioritize the child’s best interest. In practice, this means ensuring the robot never gives judgmental feedback ("That was bad") or creates hierarchical comparisons with other children’s responses. The robot should instead focus on affirmative reinforcement and therapist-supervised decision-making. Furthermore, models used to analyze speech should be trained and/or tested on diverse data to avoid systemic bias. If AI is integrated into the system, then the AI Act will require such fairness by design in all child-facing systems.

Third-Party Integrations & Data Sharing

Since we are sharing the data collected from the robot to the therapist, we must ensure that strict data-sharing policies are in place that require parental/therapist consent. Furthermore if we use any 3rd party services, like cloud storage providers, AI tools, or healthcare platforms we must make sure data is fully anonymised so no there is no risk of re-identification.

Per GDPR Article 28, any third-party processor (e.g., cloud hosting or transcription services) must sign a Data Processing Agreement (DPA) that details how long data is stored, how it is deleted, and under what jurisdictions it is hosted. Preference should be given to EU-based service providers, as storing and processing data within the EU ensures that all operations fall under the same data protection regulations (like the GDPR). If any data must be transferred to or processed by services outside the EU, additional legal safeguards—such as Standard Contractual Clauses (SCCs)—must be in place to maintain the same level of privacy and security.

Liability & Accountability

In case of issues with function or potential data leak we must make sure to hold the responsible parties accountable. This is especially important in the case of AI functionalities as no responsibility can be placed on such actors, and must be placed on manufacturers and programmers.

If there are technical issues with the function of the robot or issues with data transmission encryption we (the creators of the product) must be held accountable for the issues with robot. If there are issues with storage of data due to failures of 3rd party systems those system creators must also be held accountable for the issues in their system. If the therapist or medical professional that is treating a patient leaks data or provides bad treatment intentionally or otherwise they must also be held accountable for their conduct and actions.

These are all specified in the Product Liability Directive under manufacturer accountability.

User Safety & Compliance

Since the robot interacts directly with children, we ensure its physical safety through non-toxic materials and a child-safe design, comply with toy safety regulations if applicable, prevent harmful AI behaviour by avoiding misleading feedback and ensuring therapist oversight, and adhere to assistive technology standards for accessibility and alternative input methods.

This is so we adhere with ISO 13482 and EN 71 (EU toy safety standard). Compliance with EN 71 ensures no sharp edges, toxic materials, or choking hazards are present. Meanwhile, ISO 13482 sets requirements for robots interacting closely with humans, mandating passive safety, mechanical stability, and safe stopping behaviors. The plush must therefore undergo drop testing, force-limiting evaluation for servos, and potentially even fire-resistance testing depending on local regulations.

Interviews and Data collection

We will also be conducting interviews with several relevant volunteers upon this matter as well as experts in the field. To ensure ethical standards, we will obtain informed consent from all participants, clearly explain the purpose of the interviews, and allow them to withdraw at any time. Confidentiality will be maintained by anonymizing responses and securely storing data. Additionally, we will follow ethical guidelines for research involving human subjects, ensuring that no harm or undue pressure is placed on participants.

First interview

What does the current process for conducting speech impediment therapy look like with young patients/children?

The test is divided into multiple subsections, each testing a certain trait of speech; however, there are two main groups of test questions:

A set of strict questions, which are constrained questions, is mostly an exercise of repetition. It is known as production constraint
Set of spontaneous speaking questions. Where the patient needs to speak for a certain amount of time.
- For example: “tell me about how was your last holiday” for 60 seconds

What is the most time-consuming aspect of speech therapy?

By far, the most time-consuming aspect of diagnostics is not passing the test with the patient but the analysis of the data that has been collected. A lot of data has been connected, and the speech therapist needs to analyse them and begin to come up with a conclusion of whether the patient has a certain speech impairment or not. If it is not the first time the patient has taken the test and he is retaking it to see whether there is an improvement, we need to analyse if there is an improvement, where the improvement is and what still needs to be worked on.

What are the current limits of the current diagnostic tests?

In the diagnostics context, the current limits are for sure the biases that affect the test result. Well, the main bias is the speech therapist itself. The test result can be affected by the state of the speech therapist. Whether she is actively speaking, in what emotional state she is, or whether she is tired, all of these biases result in a non-objective view from the speech therapist. This is where the robot can have an impact, removing those biases and reaching a fully objective and logical test result. Also, what I can see from your robot is that it can save time. Often in diagnostic tests, patients need to repeat many sentences; however, with a robot, it can make the patient repeat only certain well-chosen sentences and then, based on that can do a profound analysis on the recording.

How do kids usually experience diagnostic tests?

Kids often experience the test to be:

Long
Tiring
Laborous
Can put people in difficult situations
Put them in front of their difficulties

Do you think there are areas in the speech diagnosis system which need improvement? What are the biggest challenges when diagnosing speech impediments in young children at the moment?

Well yes all of these biases we discussed before. If we can remove them then we can have diagnostics that are much more reliable. A tired kid due to the test can result in a bias, as he might not perform as well. This is where a robot can also remove a bias, remove the factor of tiredness. If it instead of asking the patient many question it asks to repeat ceratin, limited amount of sentence and perform deep analysis, it will remove the bias of a patient being tired.

How do you currently track a child's progress during therapy sessions?

By retesting them after a program of reeducation, and making them do the exact same test. Compare both results. Most speech impairment are neurological, which means that normally with no treatment there is little to no improvement. A reduction is necessarly thus needed for those type of patient

Design

Device Description Appearance:

The design needed to not attract the target patient visually and enourage them to interact and play with it but needed an adequate form for the additional hardware to be installed on it. The microphone need to be installed in an area where the patient would natuarally speak when interacting with the plush toy. Buttons had to be placed in locations that the patient would typically touch or hold. The LEDs had to be installed visible areas, with their placement helping to explain their role.

Plushie used for the project

An elephant plush was found suitable for all these requirements. It features a trunk that could house the microphone and the recording button. This placement is ideal as it allowed the microphone to be as close as possible to the patient, helping the focus of the patient's speech while reducing any disturbing surroundings. An elephant plush has two ears that can be motorised to bring life to the toy and provide informative cues - such as raising the ears to indicate to the patient that the robot is recording, which mimics a comprehensive listening gesture. As for the buttons, they can be integrated into the paws of the plushie as children often play with the extremities of plush toys.

Internal Hardware:

A wide variety of electronic components were required to meet the project's requirements. These components had to be carefully studied to ensure proper functionality when assembled together. The required components were divided into four functional areas: Communication, storage, audio, and actuation

Regarding the communication aspect, a wireless technology had to be selected to enable interaction between the application and the plush. This set communication allowed the robot to transmit the patient's recorded answers to the application for later review. This chosen technology had to be wireless, accessible and already supported by the device running the application. Considering these requirements, two mainstream and widely supported technologies were evaluated: Bluetooth and Wi-Fi^[40]. Both of them were viable options, each of them with their own advantage and drawbacks. When assessing the Bluetooth technology, it is an easier one to implement as it is a direct device-to-device communication protocol, meaning it functions without the need of a network infrastructure, which Wi-Fi requires. It also consumes less power, which is needed for battery-operated devices. Regarding the high data transfer rate it is a key requirement for the project, as it needs to transmit multiple audio recording, where each is potentially multiple megabytes however bluetooth provides a lower data transfer (1-24 Mbps^[41]) rates compared to Wi-Fi (50-1000+ Mbps^[42]). As a result of this requirement, the Wi-Fi communication protocol was the most suitable one for the project.

In order to function properly, the question of how to store large and high quantities of audio files had to be answered. It was certain that the internal memory of the microprocessor to be chosen would not suffice due to its lack of storage space. A secondary storage device was therefore needed. Many options were available: SSDs, hard drives, SD cards and microSD^[43]. However, the requirement that allowed the choice between them to be a simple one was that the secondary device had to be as compact and small as possible. The reason behind this is that it had to fit inside the robot and not add too much weight to it. The micro SD card was therefore the chosen storage technology chosen for this project as it was the one that was the best suited to the requirements

Regarding the functional audio area of the project, the robot needed to have the functionality to be able to record and play audio files. For the recording aspect, a microphone suffices to cover this requirement. The microphone records the patient and directly transmits the data to the microprocessor to be processed. For the playing aspect, a speaker is needed to transform the audio in electrical form into actual sound; however, for a speaker to be loud enough, an amplifier ^[44]is needed.

For the actuation area, two servo motors were chosen to be integrated in the plush toy to enable the ear movement. As previously described allowing it to be expressive and appealing to the patient. Besides the servo motors, buttons and leds are included to enable the robot to understand the patients intents, and convey through the LED a visual feedback that the input have been captured. The patient will be able to interact with robot through three bottons: "recording", "yes", and "no" button. There will be three LEDs: a "on/off" led, one to show that the button "yes" has been pushed and one to show that button "no" has been well push.

Based on the paragraphs above here are the following components needed for the project:

Component Types Needed
Component Type	Quantity
SD Card	1
Amplifier	1
Speaker	1
Microphone	1
LED	3
Wifi module	1
Microcontroller	1
Servo Motors	2

The following sub sections, evaluates the specific model needed for each component type stated in the table above. Once the model determine, a research is completed on how the model will be connected to other components and if there is any specific functionalities the the microcontroller requires to sucessfully communicate with the component.

Micro SD card

An appropriate micro SD storage size had to be chosen for the project. The storage size had to be large enough to be able to store more than three hours of audio recording since the diagnostic test is around this duration. A 64 GB micro SD ^[45]card was available to purchase however calculations had to made to ensure the requirement were met. Here is the calculations that were done:

The recordings were in the form of WAV file and the size of those files can be closely approximated with the following equation:

Sizes (in bytes) = sample Rate * bit depth * channels * duration (seconds) / 8^[46]

Sample rate: 16kz, a typical rate

Bit depth: 24 bits, based on the bit depth outputed by the INMP441 MEMS Microphone

Channels: 1, INMP441 MEMS Microphone outputs one channel

Duration: 60 seconds

Calculate WAV file size of one minute = 16 000 * 24 * 1* 60 / 8 = 230,400,00 bytes = 23.04 mega bytes

Thus 64 gigabytes, in megabyte is; 64 *1024 = 65536

Calculate how many minutes the 64 GB micro SD card can save: 65536 / 23.04 = 2,844

The card is capable of storing to 2,844 minutes of recording, thus 47,4 hours of recording. Therefore the storage the micro SD card provided is as sufficiently enough for the project.

Connection

For a microcontroller to sucessfully communicate with micro SD card the following communication^[47] lines are needed:

- VDD: 3V3

- MISO: pin takes care of transfer data from the sd card and the microcontroller

- GND: Ground

- SCLK: Serial Clock, sync between the sd card and the micro controller

- VDD: 3V3

- GND: Ground

- MOSI: pin takes care of transfer data from the microcontroller to the sd card

- CS: Chip select, pin used to select the role between the micro controller

Based on these connections, for a microcontroller to be able to communicate with the micro SD card, it will therefore need, four individual GPIO pins, a 3.3 volt output and a ground pin.

Amplifier

For the amplifier to fit the project, it first must be able to be powered with a microcontroller, which means it needs a power rating on 3.3V - 5V, as these are the voltages, the general microcontrollers (Arduino^[48], Raspberry Pi^[49], ESP32^[50]) can output. Additionally, these modules can be a fire hazard if installed incorrectly, as some of them produce a lot of heat^[51]. Therefore to minimse those risks a class D amplifier^[52] is needed, which are power efficient and low heat amplifiers. These amplifier types not only reduce the described hazard but also minimse the power consumption from the battery. The MAX98357 ^[53]is an amplifier that reaches those requirements. It is five volt, Class D amplifier that is not only compact but also affordable coming at a price of 8.50 euro.

Connections:

For a microcontroller to sucessfully communicate with the amplifier the following communication lines are needed:

SPK+ : Speaker Postive terminal

SKP- : Speaker Negative terminal

DIN : Data input

BCLK : Bit Clock

LRC:

GND : Ground

VDD : 5 volt

Based on these connections, for a microcontroller to be able to communicate with the amplifier, it will therefore need, three individual GPIO pins, a 3.3 volt output and a ground pin, and 2 connections needed for the speaker.

Microphone

For a microphone to be suitable for the current project, it is prefered to be one that is known to be easily compatible with modern general-purpose microcontrollers such as the Arduino^[48], Raspberry Pi^[49], and ESP32^[50]. The board needs to be compact enough to be easily integrated in the trunk meaning that it should not exceed 1.5cm wide and 1.5cm in length.

Two microphone were tested for the project, both of which met these requirements: Electret Condenser Microphone^[54] and INMP441 MEMS Microphone. The INMP441 MEMS Microphone^[55] was proved to be a better match since it provided consequently less noise when recording.

Therefore, the INMP441 MEMS Microphone was selected for the current project.

Connections:

VDD ---- 3.3V supply

GND ---- Common Ground

SCK ---- Serial Clock

WS ---- Word select

SD ---- Serial data

Based on these connections, for a microcontroller to be able to communicate with the microphone, it will therefore need, three individual GPIO pins, a 3.3 volt output and a ground pin.

Microcontroller Unit:

Multiple development boards were taken into consideration for the project, including Arduino Nanos^[56], Raspberry Pi^[49], and Arduino Unos.^[57] However, the board ESP32 ^[58]development board from TinyPico was a clear choice.

Firstly, in terms of physical dimensions, this ESP32, is the world's smallest, fully featured ESP32 development board on the market^[59]: 18mm by 32mm. Another reason in selecting this development board its efficient deep sleep functionality, allowing to draw from the battery as little as 20uA, significantly lower than the Arduino Nano who typically draws around 7.4 m^[59]A.

The final reason that confirmed the choice is that it contains the exact number of GPIO pins required, and it has an inbuilt wifi module. All of these functionalities are at a reasonable price of just 19.80 euros.

Needed GPIO pins
Component	GPIO pins needed
Buttons	3
Amplifier	3
Microphone	3
Servo Motors	1
Micro SD card	4

Materials Bought


Components	Price per Component	Quantity	Total Price
MAX98357 I2S Amplifier Module	8.50	1	8.50
INMP441 MEMS Microphone	3.50	1	3.50
Sandisk Ultra 64GB microSD card	10.00	1	10.00
SD-card adapter	3.00	1	3.00
Speaker Set - 8Ω 5W	6.75	1	6.75
ESP32 Development Board - USB-C	19.50	1	19.50
8 Pin IC socket	0.30	1	0.90
16 Pin IC socket	0.40	1	0.40
14 Pin IC socker	0.35	1	0.35
Jumper wire Male-Female 20cm 10 wires	0.75	1	0.75
Jumper wire Male-Male 20cm 10 wires	0.75	1	0.75
Jumper wire Female -Female 20cm 10 wires	0.75	1	0.75
40 Pins header Male	0.30	2	0.60
16 Pins header Female	0.32	1	0.32
10Ω-1MΩ Resistor Set	5	1	5.00
Push button	0.15	3	0.30
Switch with lever	1.00	1	1.00
			62.37 €

Planning Phase

The first stage of implementation was creating a schematic diagram of the circuit. This allowed to plan how each electronical module connect together and how the software of the esp32 would function. The software team had insight of the role of each pin of the ESP32 helping them to start the code implementation. Once the diagram was created and the whole team agreed on the structure, it had to then be implemented on a breadboard.

Testing Phase

The circuit was initially assembled a breadboard, to have a plateform for testing both the software and the hardware. During the phase the software and the hardware configuration were finalised.

During this phase, multiple hardware modifications were implemented to improve the quality of the recorded audio files. The microphone setup was redefined by reducing the length of the wire used and applying a ground voltage to one of the unused microphone pins (L/R pin). Additionally, a pull-up resistors were also added between the ESP32 and the SD card. All of these changes significantly improved the audio quality. However the last affordable modification left on the circuit to be made was to solder the entire circuit onto a PCB board, allowing constant stable and reliable wire connections.

Soldering Phase

Once the circuit was finalised on the breadboard, it was replicated onto a PCB board, which includes the microphone, ESP32, and amplifier. The remaining components were not permanently connected to the ESP32; instead, soldered male header pins on the PCB and female jumper wires on the component were used, allowing components to be easily connected and disconnected to the main system. This design choice allows the buttons, LEDs and servo motors to be easily disconnected and reconnected for replacement and to be independently integrated into the plush without the entire circuit in the way while installation.

WhatsApp Image 2025-04-07 at 14.48.29 746d0769.jpg

Once the main PCB board was completed and the circuit was fully functional, all of the electronics were integrated into the plush. The servo motors were glued inside the ears, buttons were installed in the paws and trunk, and LEDs were placed in both tail and the paws. Extensive sawing was made to discreetly hide the buttons beneath a layer of fabric allowing them to be easily accessed while maintaining the plush's asethetic. A backpack was also sewn on the back of the elephant plush to house the speaker.

WhatsApp Image 2025-04-10 at 15.49.07 1a7035f1.jpg

WhatsApp Image 2025-04-10 at 15.49.06 72bd0a4f.jpg

WhatsApp Image 2025-04-10 at 15.49.06 e6feafb1.jpg

System Specification

Software

Robot

This section covers the internal software of the device. The software is designed in such a way that it accounts for unexpected use cases, with the device capable of executing adequate actions whenever met with unexpected or expected scenarios. Below shows the exact actions the robot can execute and the interactions accounted for between the patient and the device.

Patient Robot Interation Use Case Diagram

Robot Activation

The robot is activated when the patient presses a button. The setup() function initializes WiFi, sets up GPIO inputs for physical buttons, and prepares the SD card and microphone. It also connects to a server via HTTP to receive updates (e.g., question wanting to be reasked) using the postReceiveUpdates() function, which authenticates via HMAC-SHA256.

Robot Greeting and Termination

The play() function handles all audio output. It uses WAVFileReader to stream audio files (e.g., greeting1.wav and goodbye1.wav) from the SD card through the speaker. This function is used multiple times:

At startup to play a greeting.
After diagnosis completion or a "no" response to play goodbye messages

Moving Motors

To create a more engaging experience, the robot physically moves its ears towards the patient when it listens to their response using a servo motor controlled by a function moveServoToAngle(). It turns toward its ears to the patient before recording (angle 0°) and back afterward (angle 150°), mimicking natural interaction and improving attention focus.

Retrieving Questions

The robot reads from a file called questions.txt using get_first_question_number() to fetch the next pending question ID. It then plays the corresponding audio file (e.g., question3.wav) with the play() function. If there are no more questions, it exits the loop after playing goodbye2.wav.

Wait for Respone

Interaction is guided by physical buttons:

wait_for_continue_response() checks if the patient wants to proceed (yes/no buttons).
Depending on the input, the system either continues asking questions or terminates with a goodbye message

Record Response

Once a question is played, the patient holds a button to answer. While held, the record() function captures audio from the microphone and writes it to a .wav file (e.g., answer3.wav). The system stops recording either when the button is released or after 2 minutes.

Answer Question

This part ties the decision-making flow:

If the patient presses "Yes," the robot proceeds with the next question.
If "No," the session ends or loops based on design (in this case, it re-asks after delay). These user interactions are facilitated by physical buttons monitored through software logic.

Upload and Notify

After each session or question, the robot may send responses to a remote server:

requestUploadAndNotify() triggers an upload process by contacting an API, uploading the .wav file via HTTP PUT, and then calling postNotifyUpload() to confirm receipt.
Each upload uses the method hmacSha256() to encrypt messages to ensure secure communication

Flowchart of User Device Interaction

Flowchart Diagnostic Session

Application

This section outlines the functionality of the therapist-facing web application used in the speech impairment diagnosis system. The application provides a secure and intuitive interface through which therapists can access and manage speech recordings collected by the robot. It allows for reviewing audio quality, generating transcripts, and requesting follow-up interactions with the child, ensuring high diagnostic accuracy while maintaining data privacy.

Application Use Case Diagram

Authentication and Access Control

Access to the platform is gated by a dual-passcode system:

Therapist Passcode – Unique to each registered therapist.
Robot Passcode – Unique to each robot, linked to its assigned therapist.

Only when both passcodes are valid is access granted. This ensures strict isolation of data, allowing therapists to only view recordings associated with their assigned robots. This means it supports a global use case where multiple therapists can securely work with their own patients that have their own robots without any data leakage.

access control sequence diagram for application

The authentication flow is illustrated in the sequence diagram above. When a login attempt is made, the system verifies the therapist’s passcode and the robot’s unique code against the database. If both credentials are valid, the user is granted access and issued a secure session token to persist the login for a day (this can be extended to make the app nicer to use for the therapist). If either check fails, an error is returned and the UI reflects the failed login attempt, preventing unauthorized access to protected data.

Login page to the app

Core Functionalities

Audio Table and Review Panel

The Audio Table serves as the central dashboard for therapists, providing a clear and organized overview of all audio recordings associated with the authenticated robot. Each row in the table corresponds to a specific child response, complete with metadata to help therapists make informed decisions quickly.

Dashboard of the app

Key features include:

In-browser audio playback: Listen to recordings directly within the interface—no downloads required making it easier for the therapist as it keeps everything more organized.
Rich metadata display: Each recording entry shows the original question prompt .
Advanced filtering and search: Therapists can easily locate recordings by searching for specific question text or filtering by quality flags.
Visual flagging system: Audio files are automatically color-coded based on quality:
- Red – Poor quality
- Yellow – Moderate quality
- Green – Good quality
- Blue – Reprocessing in progress

This interactive and intuitive UI/UX streamlines the review workflow, allowing therapists to triage recordings more efficiently and focus their time on meaningful clinical insights which can help reduce bias in these tests.

Quality Flagging and Reprocessing

Audio recordings are automatically assigned a quality flag using basic metrics such as loudness and duration. These initial flags help triage the data, but therapists can override them after manually reviewing each recording. If a recording is deemed unusable, the therapist can trigger a "Retry" action, which flags the file as Blue and queues it for re-asking by the robot during its next interaction with the child.

flag setting sequence diagram

Transcript Generation

Each recording can be transcribed using Whisper (an automatic speech recognition system). The resulting text is then processed through a language model (GPT) specifically tuned (with prompt engineering) for speech therapy scenarios. This helps filter out irrelevant or non-verbal responses such as crying or incoherent speech. Therapists are able to view, edit, and download these transcripts as needed. To protect patient privacy, transcripts are also fully anonymized—no voice data is stored, and nothing in the text is traceable to the child (based on the questions we have examined).

transcript sequence diagram

Security and Privacy

Audio uploads are handled through signed URLs and securely stored in a cloud bucket. To prevent unauthorized access, all communication from the robot is authenticated using SHA-encrypted headers. Only verified robots and therapists are permitted to upload or access data. Additionally, all network communication is encrypted over HTTPS, providing a secure channel that safeguards sensitive healthcare data end-to-end.

Robot–App Interaction

Robots initiate all interactions with the backend, including uploading audio recordings and polling for new instructions. It does so by using three REST API endpoints:

`POST /api/request-upload`

This route will be used by the robot to request a signed upload URL for sending audio data. As seen in the sequence diagram below, upon requesting upload, the backend first validates the authenticity of the request using the robot’s unique passcode and a SHA-encrypted signature. If the signature is valid, the system returns a time-limited signed URL, which the robot uses to upload the audio file directly to the cloud storage. Once the upload is complete, the robot issues a second request to notify the server (discussed below). If the signature fails validation, the server responds with an error and the upload is aborted to maintain system integrity and prevent unauthorized access.

Reprocessing/Flag sequence diagram

`POST /api/notify-upload`

This endpoint is called immediately after a successful file upload using the signed URL obtained via the POST /api/request-upload route. It acts as a confirmation step, allowing the robot to notify the backend that the audio file has been successfully uploaded.

As shown in the upload sequence diagram, once the app receives this notification, it triggers a chain of actions:

The uploaded file is retrieved from cloud storage.
The system performs a verification and flagging check using verifyAndUpdateFlag().
The therapist-facing UI is updated in real-time to reflect the new recording and its associated quality flag.

This separation between file upload and post-processing ensures a clean, event-driven architecture while keeping the robot-side logic relatively lightweight.

`GET /api/receive-updates`

This route is used by the robot during its boot-up routine to check for pending therapist actions, such as re-ask requests for low-quality or missing audio.

As visualized in the update polling sequence diagram, the robot sends a requestUpdates(robotCode) call, which includes its unique code. The backend first authenticates the robot by validating its signature. If valid, it queries the database for any queued actions tied to that specific robot and returns them along with a fresh signed upload URL (for convenience, so the robot is ready to record again immediately). If the signature is invalid, the app returns an error and halts further interaction.

This polling model keeps the robot's behavior simple and stateless. It eliminates the need for persistent connections or active listening, which is essential given the limited resources of embedded devices like the ESP32.

robot question update sequence diagram

Testing

For the testing in this project we focused on performing tests on interaction with the robot as we felt this was the most important of all the aspects of the robot.

To ensure the robot would perform reliably in real-world settings, interaction-based testing that simulated a wide range of human behaviours and emotional responses were conducted. These simulations were designed to reflect real-world unpredictability—such as a child crying, an individual ignoring the robot’s prompts, or a user exhibiting confusion or distress. Each scenario tested the robot’s ability to respond expectedly and consistently.

Different behavioural scripts were created to simulate these situations, involving both controlled environment testing and live role-playing. This method allowed for an assessment of the robot's capacity to interpret emotional cues, adapt its execution, and maintain a coherent presence. For example, if the device user began crying during a response, the application would flag this and displayed in the app whether the application user would like to have the question repeated in the next session.

A key part of the testing process involved verifying that no matter how unpredictable the behaviour, the robot would not crash, freeze, or behave erratically. Each edge case was logged and refined, making sure that robust fall-back behaviours were in place. This testing helped fine-tune the robot’s decision trees and response models, preparing it for nuanced human interactions in different situations.

Below is a demo of a standard testing interaction between a user and the device.

Demo

https://drive.google.com/drive/folders/12wH6WRij3pu-DExacPk9iGUBUQJe9NIH?usp=drive_link

Analysis and Limitations

From the testing we will able to complete within our group we realized 2 main specific points. The way in which the robot speaks to and gives speech instructions to the child must be very clear and well enunciated so the child not only has time to listen, but also has the best chance of understanding what the robot says the first time.

Secondly the number of potential cases and interactions and their reaction by the robot must be finely tuned and will take a lot of trials and tests to perfect. In our current implementation the way questions are answered and asked is rigid but very forgiving. If you make a mistake it is treated the same as if you answered well which, although problematic as it can normalize bad answering habits, places very little pressure on the interacting user. So although you will get many potentially bad quality recordings, the few that do come through correctly will be very representative and useful in the diagnostic process. What recordings are bad can easily be repeated thanks to the system updating the robot based on the flags and issues it detects. The natural inclination to fix this would be to just add reactions for every possible mistake and request a repeated attempt on the spot, but in practice this presents many potential issues. Forcing a child with a low attention span to repeatedly do the same question over and over can lead to easy loss of interest, hesitation and bias in the results, as the robot’s judgement and reaction may be registered negatively by the child using it. So there must be a balance and lenience when it comes to letting bad responses through. Our implementation is very basic but makes it so that little pressure is experienced by the user. Maybe some level of repetition on the spot could help reduce the amount of bad attempts but this needs extensive testing which we could not complete within the allocated time.

We tested over many iterations the microphone and speaker clarity to make sure that what is being recorded is as clear and understandable as possible. Obviously there was a limit to the amount we could improve and increase the clarity of the recordings due to the quality of the microphone and speaker, but we tweaked it such that it was absolutely clear what was being said. Also all of the response and question progression mechanisms the robot needed for carrying out diagnostic test questions were rigorously tested to make sure they work correctly.

All testing was carried out by members of the team as unfortunately due to the limited time and nature of the product being created for this project, we were unable to get permission and carry out a test with an actual child participant. This robot is technical under the umbrella of a medical device and also is meant for children, so getting clearance from the ethics board took too long for the time allocated.

We also realized through testing that due to the RAM limitations of the ESP32 recording uploading was powerfully limited and ran into many issues. This is because the EPS32 would need to save space in RAM and flash memory for the running code, the .wav recording and the uploading protocols simultaneously. So files needed to split up into chunks and sent over the internet. However this is extraordinarily complex and out of the scope of possible methods to implement and test in the given time. So due to the hardware we decided to only provide small files over transfer to demonstrate at the very least the upload capability. However this was only the case with recordings as, save file and question updates required a lot less information and space so these could be implemented. For recording processing we manually uploaded recordings to the database for the app to process and the processing passed the test very easily after some short debugging.

Potential Improvements

To improve the robot 3 there are three main things that our research and testing has indicated:

Firstly the child to robot interaction needs even more testing and needs to refined to take into account more edge cases. Our current implementation is quite basic so many situations may occur where the robot’s simple reaction is insufficient and there may need to be more work done to maximize the ability collect responses. It may be pertinent to add a reaction to repeat a question or even positively respond to good recordings to make sure the child is engaged and comfortable. However additions and changes must be done in a measured and tested capacity to preserve the low judgment and low pressure the interaction needs to be successful, which is something our basic implementation already did. Furthermore the use of servo motors in the ears really increased the robots appeal and adding more engaging interactive elements like this could greatly improve engagement.

Secondly, the robot’s hardware needs to be improved. The ESP32 did do well enough for the prototype version of this robot but it limited the performance in multiple ways. The file transfer and recording transfer over Wi-Fi was incredibly due to RAM limitations, and making the robot behaviour more complex definitely necessitates more memory and CPU power. Potentially using a raspberry PI in place of the ESP32 could be a better option, as it has much more memory, Wi-Fi capabilities and resources to work with. On top of this the microphone and speakers need to be improved to produce higher quality recordings. Higher quality recordings obviously is better for the therapist, but can also lead to more possibilities in the realm of processing on the app as now more detailed and accurate analysis can be done on the sound and speech data.

Finally the robot needs better final assembly and housing. The current implementation is a little fragile and could break if handled carelessly. The plush also became deformed in its shape when fitting the electronic in place so this needs to be improved to maximize appeal. In future we should consider soldering more of the parts together to make sure they don’t disconnect and add interior protective casing to electronics to stop them getting damaged.

Interviews

Introduction

To develop a robot that effectively assists in speech therapy, it's important to understand current therapy practices and identify potential areas for improvement. We conduct interviews with speech therapists and parents of children with speech impairments to gain insight into existing methods, challenges, and how a robot might assist in this process.

Current Practices and Challenges

Participants (speech therapists) are asked about challenges they commonly face in data analysis, therapy workloads, and the overall diagnostic process. Feedback is collected regarding their experiences, highlighting the most demanding or time-consuming aspects of existing methods. Therapists are also asked about which speech impairments could detected by voice recognition (i.e. the robot), including specific signs or indicators commonly used for diagnosis.

User Experience and Engagement

The interviews aim to provide a better understanding of children's experiences during diagnostic assessments, particularly regarding factors leading to fatigue, anxiety, or disengagement. Identifying these factors is important in order to design the proposed speech therapy robot so that it may make therapy sessions more engaging and less stressful. Participants discuss factors they believe are most important for keeping children actively involved, including interactive elements, rewards, vocal feedback, and gamification strategies.

Potential for Robotic Assistance

Participants are asked to share their feelings on integrating a plush robot into speech therapy sessions. Discussions cover how child patients might respond to such a robot, whether positively or negatively, and therapists' feelings regarding interactions with social robots. Therapists should address how comfortable they are working alongside a robot, their views on robots potentially replacing certain tasks, and the inherent limitations a robot has compared to human therapists. These insights help clarify how the user groups view social robots, whether as complementary tools or replacements for existing therapist tasks.

Practical Considerations

To better refine the practical design of the robot, the interviews cover preferred methods for interacting with the device, including physical buttons, voice commands, and ways to review collected speech data (e.g., via a mobile app, web application, or directly from the robot). Participants are also asked to share how frequently they could imagine using or recommending a speech therapy robot. Furthermore, the therapists are asked to consider whether they believe the robot might increase children's engagement or participation in therapy. Finally, participants provide their preferences and wishes regarding the physical appearance of the robot.

Privacy & Ethics

Due to the sensitive nature of speech data collection and pediatrics in general, the interviews explicitly address privacy and ethical considerations. Participants are asked if they would be comfortable with the robot recording children's speech for later review by therapists, and their opinions on third-party access to such data, especially involving AI technologies such as large language models (LLMs). The interviews further discuss whether data should be stored locally on the robot or if participants would accept secure storage on remote servers. Participants are also asked to share their thoughts on whether the robot should serve as a complement or potentially replace certain aspects of traditional, in-person therapy sessions. The participants highlight some of their biggest hesitations or concerns related to integrating technology into speech therapy practices.

Method

Interviews are conducted either in-person or through video calls, each lasting around 15 to 30 minutes. With participants' consent, these interviews are recorded and transcribed for detailed analysis. Participants answer open-ended questions about current diagnostic methods, challenges in analyzing speech data, potential biases in diagnosis, and general experiences with speech therapy sessions. Additionally, we ask the participants' to share their thoughts on using a friendly, plush robot during therapy. Topics include possible benefits of the robot, preferred design features, usability, and data privacy.

Interview questions are adjusted based on the participant’s role. Speech therapists are mainly asked about technical and diagnostic challenges and practical robot-related considerations. Parents are encouraged to discuss their children's experiences with therapy and assessments, as well as share their experiences as the parent in this regard. The insights from these interviews aim to guide us in improving the design of the proposed speech therapy robot.

Results

Final Product Interviews

The plush robotic prototype was the subject of two distinct interviews in order to gather comprehensive input. The principal speech therapist, with whom the project had been closely collaborating, was interviewed first. In the second interview, a professional showed interest in using such a product in the actual world.

First Interview

A presentation of the plush robot in its off-power mode kicked off the first session. The therapist commented right away on the plush toy's aesthetic appeal and friendliness, highlighting in particular how well the electronic components were incorporated into the design.

The plush toy was then turned on for a live demonstration. The therapist appreciated the robot's general friendliness and the way it conveyed happiness with its natural speaking rhythm, tone, and approachable manner. However, she noted that audio quality was a major problem, emphasizing that audio recordings must be clear in order to accurately analyze speech, especially when catching small speech abnormalities.

The therapist also recommended including a camera in further prototypes. She clarified that therapists use nonverbal cues, facial expressions, and overall body language in addition to audible information while conducting clinical exams. As a result, adding the ability to record visual information would greatly improve diagnostic precision.

Second Interview

The therapist came to the conclusion that while the plush robot successfully solves the stated issue, future advancements must include enhanced audio quality and visual recording capabilities.

The second interview was with a speech therapist who is currently employed by Diadolab^[60], a startup. Within an hour, this startup seeks to deliver assessments of speech intelligibility and fluency, along with the automated creation of an extensive report. The project uses AI-based audio analysis techniques to partly tackle the same fundamental issue.

She gave the plush robot design a great response, especially praising the idea of incorporating speech evaluation features into a kid-friendly toy. In an effort to lessen the stress that young patients frequently undergo, the therapist underlined the important advantages of performing speech evaluations in a casual, non-clinical setting. The ability to record the child's reactions and immediately transfer those recordings to an external program for review was another feature she highlighted.

She also talked out loud about how using a soft toy to help with speech assessments may potentially be useful idea to keep in mind for own startup's automated assessment system in the future. The speach therapist even invited on of the team members to join the startup due to how she liked how innovative the project was.

The therapist concluded by emphasizing that the project tackles an important and sometimes disregarded issue, namely the elimination of biases during evaluations. Ending the session on an amusing remark that this project genuinely represented innovation within the field.

https://www.formationsvoixparole.fr/logiciels/diadolab-pour-la-parole/

Conclusion

In this project, we developed a child-friendly robot designed to support speech therapists in diagnosing speech impairments in young children. The robot takes the form of an interactive animal plushie, chosen specifically to appear approachable and comforting to children during assessment sessions. Its design integrates a microphone, speaker, and pressure sensors, allowing children to interact naturally by talking to the plushie and squeezing it to respond.

The robot engages children in structured diagnostic tasks by asking questions and prompting verbal responses in a conversational, playful manner. It is also programmed to allow the children to take breaks, promoting a low-stress environment that accommodates short attention spans and helps reduce cognitive fatigue. By creating a more engaging and child-focussed therapy experience, the robot aims to encourage more accurate and consistent speech samples while supporting the therapist's work.

This project combines elements of human-centered design, assistive robotics, and speech-language pathology to explore how socially assistive technology can make early diagnosis more accessible, comfortable, and effective for both children and clinicians.

Discussion

The project aimed to address several critical limitations inherent in traditional pediatric speech diagnostic methods by introducing an interactive plush robot specifically designed to support therapists and engage young children. Extensive testing and iterative refinement demonstrated the robot’s potential to significantly enhance diagnostic processes by reducing fatigue, improving child engagement, and ensuring greater consistency and reliability in data collection.

The interactive plush robot effectively tackles key limitations associated with current diagnostic approaches, particularly the cognitive fatigue and decreased engagement experienced during lengthy clinical sessions. By segmenting assessments into shorter, playful, and gamified interactions, the robot helps maintain children's attention and interest, directly improving the quality of their responses. This aligns with documented concerns in existing literature regarding the impacts of fatigue and prolonged sessions on diagnostic accuracy^[3]^[27].

Additionally, the robot consistently captured high-quality audio data, facilitating therapists’ work by enabling asynchronous and detailed analysis. By reducing therapist fatigue and subjective variability, the robot increases diagnostic objectivity, thereby potentially enhancing the effectiveness of subsequent therapeutic interventions.

While the project made significant progress, several limitations emerged during the development and research phases:

Interview Limitations: Interviews conducted to gather user insights had constraints due to the relatively small sample size and the use of convenience sampling, potentially overlooking diverse perspectives critical to comprehensive research. Furthermore, the extensive length and detailed nature of these interviews may have led to respondent fatigue, possibly affecting the quality of collected data. Social desirability bias could also have influenced participants' responses, particularly because of the familiarity between interviewers and interviewees, potentially resulting in overly positive feedback.

Technological Limitations: The development process faced technical challenges related to audio recording quality and consistent data handling. The project's dependence on specific hardware components and local storage solutions restricted overall flexibility and scalability. Despite iterative improvements, occasional issues such as background noise and audio interference persisted. The reliance on physical components like buttons, motors and LEDs also required careful consideration regarding durability, robustness, and child-safe design to ensure practical, long-term usability.

Privacy and Ethical Limitations: Adhering strictly to GDPR, MDR, and other privacy regulations introduced considerable complexity, particularly around data handling practices, parental consent mechanisms, and transparency requirements. These constraints limited the project's potential to integrate advanced third-party AI analytics, narrowing the scope of certain beneficial features. Additionally, the stringent data protection measures increased technical implementation complexity, underscoring the challenge of balancing advanced functionality with regulatory compliance.

The project's outcomes have substantial implications for both clinical and educational settings. For speech therapists, the robot offers notable improvements by reducing repetitive diagnostic tasks and supporting detailed asynchronous review and analysis. This allows therapists to spend more time on focused therapeutic interventions and in-depth analysis, potentially improving speech and language outcomes for children.

In educational contexts, the robot can help alleviate significant resource constraints by decreasing waitlists and enabling remote diagnostic assessments in underserved areas. This capability enhances equitable access to essential speech therapy services, improving early intervention rates and bridging healthcare availability gaps. Introducing this technology in educational settings also raises awareness about speech impairments, encouraging proactive management and intervention.

To further improve the robot’s diagnostic capabilities and user experience, the following improvements could be considered:

Improvements in Audio Processing: Integration of enhanced noise-cancellation techniques and pediatric-focused speech recognition technologies would significantly boost diagnostic accuracy and clarity in recordings, providing therapists with more actionable data.
Improvements in (Primary) User Interaction - Therapist Workflow: Developing advanced interactive elements such as intuitive voice commands and personalized gamified interactions could further enhance child engagement and accessibility, particularly for younger children. Adaptive difficulty levels tailored to individual capabilities and engagement metrics would further increase usability and effectiveness.
Improvements in (Secondary) User Interaction - Child Engagement: Implementing advanced AI-driven analytical methods, including automatic detection of speech irregularities and adaptive real-time feedback, could substantially reduce therapist workloads and improve diagnostic precision. Employing machine learning models trained on comprehensive pediatric speech datasets would facilitate early and accurate identification of subtle impairments.
AI-driven Diagnostic Features: Building upon the established set of diagnostic questions, future versions could offer therapists customizable and comprehensive therapy programs tailored specifically for various speech disorders. By expanding the question database and enabling therapists to tailor assessment protocols dynamically, the robot could support more diverse and specialized diagnostic scenarios.
Extending Speech Therapy Programs: Building upon the established diagnostic framework already embedded in the robot, future iterations could offer a more comprehensive therapy suite. By expanding the database of validated speech tasks (e.g., sound discrimination, articulation drills, spontaneous speech prompts), and enabling therapists to configure custom pathways based on disorder type, the robot could become a more active tool in long-term treatment, not just diagnosis. This shift from a single-session assessment tool to a platform supporting ongoing therapy sessions would enable remote, scalable, and individualized care.
Conducting Further Research: Conducting comprehensive longitudinal studies across diverse populations and varied care settings (e.g., urban vs. rural, multilingual households, neurodiverse children) would help validate and refine the robot's effectiveness. In particular, comparing robot-facilitated diagnostics with conventional methods over multiple sessions would highlight strengths, limitations, and necessary adaptations for broad adoption.
Extensive Testing and Edge Case Handling: While the current robot handled typical interactions well, broader testing should explore a wider range of real-world behaviors—including emotional outbursts, silence, accidental inputs, and deliberate non-cooperation. Building upon this, more nuanced fallback strategies (e.g., rephrasing a question, offering encouragement, or pausing automatically) can be integrated. Establishing a protocol for how the robot should escalate or report unclear sessions to the therapist would also enhance clinical utility and accountability.
Data Security Measures and Compliance: As the system scales, continued investment in encryption protocols, secure boot mechanisms, and robust authentication procedures (e.g., multi-factor logins, timed access tokens) will be essential. Furthermore, including transparent interfaces for parental oversight (e.g., viewing and revoking session data) would foster greater trust. Automated consent management tools, audit trails, and data anonymization layers should be regularly updated to ensure the platform remains compliant with evolving data protection regulations and industry best practices.

Bibliography

https://www.researchgate.net/publication/384968031_Exploring_solutions_to_reduce_waiting_lists_for_speech_and_language_services_in_the_Netherlands

Appendix

Time reporting

Week 1

Name	Task	Time spent
Andreas Sinharoy	Robot and Problem Ideation and Research into the Idea	2 hours
Luis Fernandez Gu	Problem Ideation and Research	2 hours
Alex Gavriliu	Research into data privacy requirements in EU	3 hour
Theophile Guillet	Research on projet idea	2 hours
Petar Rustić	Creating the wiki structure, literature research	2 hours
Floris Bruin	Problem Ideation, literature research	2 hours
All		13 hours

Week 2

Name	Task	Time spent
Andreas Sinharoy	Writing the Planning and Introduction sections of the wiki page	3 hours
Luis Fernandez Gu	Wrote and planned Requirements Section	2 hours
Alex Gavriliu	creating appropriate structure for legal and privacy section	5 hours
Theophile Guillet	Conducted interview, and wrote hardware section	2 hours
Petar Rustić	Literature research, writing the USE analysis	6 hours
Floris Bruin	Literature research	5 hours
All		23 hours

Week 3

Name	Task	Time spent
Andreas Sinharoy	Helped obtain materials for the prototype, contacted and scheduled interviews	3 hours
Luis Fernandez Gu	Helped with interviews and scheduled/did one.	1 hours
Alex Gavriliu	Helped outline specifc functions of the robot based off of the interviews performed by other team members	3 hours
Theophile Guillet	Started working on the hardware prototype, planned circuit	8 hours
Petar Rustić	Researching therapy processes, specific speech impairments, telehealth practices, looking for interview candidates	3 hours
Floris Bruin	Looked at past group's projects, literature research, updating the wiki	5 hours
All		22 hours

Week 4

Name	Task	Time spent
Andreas Sinharoy	Started construction of the robot	6 hours
Luis Fernandez Gu	Started planning the robot (mostly whats written in specification, planning how auth will work, how communication will work, selecting a framework etc.)	5 hours
Alex Gavriliu	Assisted in microphone testing and implementation	4 hours
Theophile Guillet	Implemented circuit on breadboard and worked on audio improvement	6 hours
Petar Rustić	Writing the interview introduction and method sections, started writing 'Literature Study'	4 hours
Floris Bruin	Updating the wiki	5 hours
All		29 hours

Week 5

Name	Task	Time spent
Andreas Sinharoy	Was not in the Netherlands, but managed to help a bit	1 hour
Luis Fernandez Gu	Worked on building the App (auth, encryption, robot-app communication, etc)	16 hours
Alex Gavriliu	Helped with robot wiring and coding, as well as helping further specifying application ideas and implementation	6 hour
Theophile Guillet	Worked on soldering the circuit on PCB	8 hours
Petar Rustić	Out sick, looking into EU GDPR eHealth regulations	2 hours
Floris Bruin	Sick the whole week	0 hours
All		33 hours

Week 6

Name	Task	Time spent
Andreas Sinharoy	Helped construct the circuitry of the robot, helped created the code/software responsible for communicating beteen the robot and the child, updated the wiki page, filled out the ethical consent form, and messaged and contacted the ethics committee to streamline obtaining ethical consent in time.	20 hours
Luis Fernandez Gu	Continued working on the app focusing more on the Robot code now that would allow the two to communicate with eachother.	10 hours
Alex Gavriliu	Was sick for most of the week but managed to assist some robot construction	6 hours
Theophile Guillet	Finalised soldering the circuit on PCB, and worked on the integrating circuit in the plush	17 hours
Petar Rustić	Continuing writing the literature study, updating the USE analysis, interview with two parents (support users)	6 hours
Floris Bruin	Partially still sick, worked on updating the wiki	7 hours
All		66 hours

Week 7

Name	Task	Time spent
Andreas Sinharoy	Programmed software of robot, finalized robot, helped with presentation, finalized wiki	18 hours
Luis Fernandez Gu	Finalized App and integration with the robot (the latter of which was done collaboratively). Helped with final presentation and finalizing the wiki.	18 hours
Alex Gavriliu	Recorded all voice line for the robot and helped with the sewing of the speaker needed for the plushy containing the electronics. Also helped prepare for the presentation. Also worked a lot on the report	16 hours
Theophile Guillet	Finalised the plush and worked on the wiki	17 hours
Petar Rustić	Finishing the literature study and USE analysis, adding additional article references to legal and privacy concerns, writing the discussion, creating persona-boards	18 hours
Floris Bruin	Finishing up the wiki	18 hours
All		104 hours

Total Hours Spent


Name	Total time Spent
Andreas Sinharoy	53 hours
Luis Fernandez Gu	54 hours
Alex Gavriliu	45 hours
Theophile Guillet	60
Petar Rustic	41 hours
Floris Bruin	42 hours
All	295 hours

[1] ttps://www.asha.org/public/speech/disorders/SpeechSoundDisorders/

[2] ttps://leader.pubs.asha.org/doi/10.1044/leader.FTR3.19112014.48?utm_source=chatgpt.com

[:0-3] {Jump up to: 3.0} ^3.1 ^3.2 https://pmc.ncbi.nlm.nih.gov/articles/PMC4790980/

[4] ttps://www.mdpi.com/2071-1050/13/5/2771

[5] ttps://www.scirp.org/reference/referencespapers?referenceid=3826718

[6] ttps://pubmed.ncbi.nlm.nih.gov/17238408/

[7] ttps://www.asha.org/public/speech/disorders/SpeechSoundDisorders/

[8] ttps://www.rcslt.org/speech-and-language-therapy/clinical-information/speech-sound-disorders/

[9] ttps://www.nhs.uk/conditions/speech-and-language-therapy/

[10] ttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3560814/

[11] ttps://www.cnld.org/how-long-should-a-childs-attention-span-be/

[12] ttps://digitalcommons.montclair.edu/etd/1386/

[13] ttps://pmc.ncbi.nlm.nih.gov/articles/PMC10240099/

[14] ttps://www.isca-archive.org/wocci_2017/grossinho17_wocci.pdf

[15] ttps://pmc.ncbi.nlm.nih.gov/articles/PMC9860467/

[16] ttps://www.researchgate.net/publication/265099078_Investigating_the_Role_of_Language_in_Children's_Early_Educational_Outcomes

[17] ttps://www.researchgate.net/publication/364191716_Parent_Satisfaction_With_Pediatric_Speech-Language_Pathology_Telepractice_Services_During_the_COVID-19_Pandemic_An_Early_Look

[:5-18] {Jump up to: 18.0} ^18.1 https://pmc.ncbi.nlm.nih.gov/articles/PMC5708870/

[:6-19] {Jump up to: 19.0} ^19.1 https://pmc.ncbi.nlm.nih.gov/articles/PMC9620692/

[:2-20] {Jump up to: 20.0} ^20.1 ^20.2 ^20.3 https://psycnet.apa.org/record/2023-98356-001

[21] ttps://www.the-independent.com/news/uk/england-nhs-england-community-government-b2584094.html

[22] ttps://www.sensationalkids.ie/shocking-wait-times-for-hse-therapy-services-revealed-by-sensational-kids-survey/

[23] ttps://pmc.ncbi.nlm.nih.gov/articles/PMC10766845/

[:12-24] {Jump up to: 24.0} ^24.1 https://www.cdc.gov/nchs/products/databriefs/db205.htm

[25] ttps://pmc.ncbi.nlm.nih.gov/articles/PMC5831094/

[26] ttps://pmc.ncbi.nlm.nih.gov/articles/PMC2740746/

[:11-27] {Jump up to: 27.0} ^27.1 https://www.scirp.org/(S(ny23rubfvg45z345vbrepxrl))/reference/referencespapers?referenceid=2156170

[28] ttps://pmc.ncbi.nlm.nih.gov/articles/PMC6802914/

[:1-29] {Jump up to: 29.0} ^29.1 https://pubmed.ncbi.nlm.nih.gov/732285/

[30] ttps://www.liebertpub.com/doi/full/10.1089/eco.2022.0087

[31] ttps://pubs.asha.org/doi/abs/10.1044/2021_PERSP-21-00284

[32] ttps://pmc.ncbi.nlm.nih.gov/articles/PMC10851737/

[33] ttps://mhealth.jmir.org/2020/10/e18858

[:3-34] {Jump up to: 34.0} ^34.1 https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2016.00640/full

[35] ttps://www.researchgate.net/publication/365865210_Artificial_intelligence_in_the_diagnosis_of_speech_disorders_in_preschool_and_primary_school_children

[36] ttps://www.sciencedirect.com/science/article/pii/S2772632024000412

[:4-37] {Jump up to: 37.0} ^37.1 https://journals.sagepub.com/doi/10.1177/2050157919843961

[38] ttps://pubmed.ncbi.nlm.nih.gov/35637787/

[39] ttps://www.jbe-platform.com/content/journals/10.1075/pc.12.1.03dau

[40] ttps://www.ytl-e.com/news/quarterly-publication/What-are-the-main-wireless-internet-of-things-communication-technologies.html

[41] ttps://resources.pcb.cadence.com/blog/2022-transmission-rate-vs-bandwidth-in-bluetooth-technology

[42] ttps://kb.netgear.com/19668/Link-Rate-and-Transfer-Speed

[43] ttps://www.unitrends.com/blog/secondary-storage/

[44] ttps://www.techtarget.com/whatis/definition/amplifier

[45] ttps://www.tinytronics.nl/en/data-storage/sd-cards/sandisk-extreme-64gb-v30-uhs-i-u3-a2-microsdxc-card-with-sd-card-adapter

[46] ttps://medium.com/@Xclusivesocial/sample-rate-bit-depth-bitrate-4ff624cc97db

[47] ttps://electronics.stackexchange.com/questions/232885/how-does-an-sd-card-communicate-with-a-computer

[:7-48] {Jump up to: 48.0} ^48.1 https://www.arduino.cc/

[:8-49] {Jump up to: 49.0} ^49.1 ^49.2 https://www.raspberrypi.com/

[:9-50] {Jump up to: 50.0} ^50.1 https://www.espressif.com/en/products/socs/esp32

[51] ttps://www.stevemeadedesigns.com/board/topic/123070-what-causes-amps-to-get-hot/

[52] ttps://en.wikipedia.org/wiki/Class-D_amplifier

[53] ttps://www.tinytronics.nl/en/audio/amplifiers/dfrobot-max98357-i2s-amplifier-module-2.5w

[54] ttps://www.tinytronics.nl/en/sensors/sound/czn-15e-electret-condenser-microphone

[55] ttps://www.tinytronics.nl/en/sensors/sound/inmp441-mems-microphone-i2s

[56] ttps://store.arduino.cc/en-nl/products/arduino-nano?srsltid=AfmBOopHWCPAhG789tpwvIgPQ0eYSG9zYQgU-eG5zkGCjqfMEy9uXbsx

[57] ttps://store.arduino.cc/en-nl/products/arduino-uno-rev3?srsltid=AfmBOoqcCP-czp1XnWr-He6aATGlYe2WdyM29Z5ZlCA5GPsnB9Ge2mjG

[58] ttps://www.tinytronics.nl/en/development-boards/microcontroller-boards/with-wi-fi/unexpected-maker-tinypico-v3-esp32-development-board-usb-c

[:10-59] {Jump up to: 59.0} ^59.1 https://www.tinypico.com/

[60] ttps://www.formationsvoixparole.fr/logiciels/diadolab-pour-la-parole/

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

PRE2024 3 Group3: Difference between revisions

Latest revision as of 01:54, 11 April 2025

Group Members

Introduction

Problem Statement

Objectives

USE Analysis

Users

Speech-Language Therapist (Primary User)

Needs:

Child Patient (Secondary User)

Needs:

Parent/Caregiver (Support User)

Needs:

Personas

Persona 1: Parent

Persona 2: Child

Persona 3: speech therapist

Scenarios

Scenario 1:

Requirements

For the Therapist

For the Child and Parent

Society

Enterprise

State of the Art

Literature study

Existing robots

RASA robot

Automatic Speech Recognition

Nao robot

Kaspar robot

Requirements

Design Specifications

Functionalities

UI/UX

Data Handling and Privacy

Performance

Application

Legal & Privacy Concerns

Data Collection & Storage

User Privacy & Consent

Security Measures

Legal Compliance & Regulations

Ethical Considerations

Third-Party Integrations & Data Sharing

Liability & Accountability

User Safety & Compliance

Interviews and Data collection

First interview

Design

Device Description Appearance:

Internal Hardware:

Micro SD card

Amplifier

Microphone

Microcontroller Unit:

Materials Bought

Planning Phase

Testing Phase

Soldering Phase

System Specification

Software

Robot

Robot Activation

Robot Greeting and Termination

Moving Motors

Retrieving Questions

Wait for Respone

Record Response

Answer Question

Upload and Notify

Flowchart of User Device Interaction

Application

Authentication and Access Control

Core Functionalities

Audio Table and Review Panel

Quality Flagging and Reprocessing

Transcript Generation

Security and Privacy

`POST /api/request-upload`

`POST /api/notify-upload`

`GET /api/receive-updates`