4/18/2018                    Patent Serial No. 9,947,322          NAU Case 2013-015

Title: “Systems and Methods for Automated Evaluation of Human Speech”

This invention is a method of detecting prosody in human speech in terms of prominent syllables, tone units, and tonic syllable tone choices as defined by David Brazil’s framework for representing prosodic features in discourse. It automatically determines these communicative features from a raw audio wave file in steps: 1) determine the phones that make up the utterance, 2) group the phones into syllables, 3) identify the prominent syllables, 4) divide the utterance into tone units, 5) determine the tone (falling, rising, rising-falling, falling-rising, or neutral) of the tonic syllables (last prominent syllable in a tone unit), and 6) determine the pitch (low, mid, or high) of the tonic syllables. The invention develops a program, which can be used for language learners, teachers, linguists, electronic engineers, computer scientists, or anyone in speech communication, who wish to better operationalize the property of human speech into a machine. Employing a text-independent system, the current invention provides a unique model for salient features of prosody in spoken parameters and integrates the stress and pitch tone features relevant to improving the performance of automated speech systems.


9/18/2022 – present (ongoing)

Kang, O., Looney, S., Hansen, J., & Hirschi, K. (2023). Second Language University Speech Intelligibility Corpus (L2 USL). Linguistic Data Consortium. Philadelphia.


Title: Second Language Speech Production: Formulation  of Objective Speech Intelligibility Measures and Learner-Specific Feedback. (NSF, EAGER Funding, 2021-2023)

Part 1: Broad Description: This Early-concept Grants for Exploratory Research (EAGER) funding project focuses on exploring and developing a novel operational collection of speech, language and perception-based measures to objectively assess speech intelligibility for second language (L2) speech production, as well as providing effective learner-specific feedback. With the rise of English as an international language, intelligibility-based successful communication has been emphasized over native-like accents. However, L2 teachers often raise concerns about learners’ slow or stagnant pronunciation progress. Several primary reasons for this problem may include difficulties in perceptually discerning changes in learners’ speech and interpreting learners’ speech patterns without any learner-specific intelligibility assessment profile. Today, teachers have no systematic way to assess each student’s speech changes, nor can students monitor and track feedback related to their pronunciation learning progression. Therefore, an exploratory and transformative method is introduced for measuring speech intelligibility that provides both teachers and learners with objective and individualized feedback. This  exploratory project is proposed for EAGER funding  in order to establish a baseline working framework for operational objective measure creation, and proof-of-concept assessment feedback for teachers and learners. This approach will help teachers gauge learners’ intelligibility levels and allow learners to self-regulate their learning progress incrementally over time. The long-term innovation is expected to benefit skilled US professionals from non-English speaking countries, who work in various STEM (science, technology, engineering, and mathematics) fields. Additionally, this interdisciplinary project provides various opportunities for hands-on training and experience for both graduate and undergraduate students in the fields of language education, applied linguistics, computer engineering, and speech technology.

Part 2: Technical Description: This project explores an idea to assess intelligibility in speech communications based on multiple individual speech measures for non-native speakers. The ideas are currently in their very early stages of development, and a large portion of the research ideas are untested. In order to establish the ground truth of potential individual speech production intelligibility measures, the implementation and feasibility of this intelligibility feedback approach must be validated with evidence. By employing advanced Automatic Speech Recognition-based accent classification technology based on machine learning, the team of researchers plan to provide learners with measured speech property information through operational and a discriminating set of objective speech intelligibility measures. The current innovation builds on language skill acquisition theory with a functional analytic-linguistic approach, arguing that explicit and metalinguistic feedback plays a pivotal role in moving learners forward in their L2 development. The vision is enabled by on-going research on auditory-based neurogram and spectrogram orthogonal polynomial measures that predict speech intelligibility, employing the learners’ unconstrained speech utterances. The project will contribute to the scientific knowledge of what constitutes L2 intelligible speech, understanding how individualized objective speech intelligibility feedback affects L2 speech development, and creating a foundational collection of speech/auditory/signal processing measures as well as ASR/DNN driven measures that assess a speaker’s intelligibility and identify efficient ways of implementing this technology in L2 learning contexts.

Title: Developing an Affective Interactive Oral Communication Tutor, Research Bridge or Seeds Awards (NAU RBS Funded, 2019-2020)

The US workforce increasingly draws upon a linguistically and culturally diverse pool of talent. However, limited English speaking ability or the presence of a strong non-native accent can reduce employment opportunities for those entering the workplace. Improving foundational communication skills and intelligibility is therefore integral to the professional success of nonnative English (or English as a second language) speakers. The research intends to develop and evaluate an affective interactive oral communication tutor (AIOCT) system in which language learners can practice and improve oral communication skills at their own convenience while receiving instantaneous, personalized guidance from the computer. The system will employ a device to engage in real-life interaction with learners and use advanced technologies based on automatic speech recognition (ASR) and virtual environments to provide practice and personalized feedback on English oral communication skills. The current pilot study will use MySpeechTainer as an initial version of AIOCT. The AIOCT will also be equipped with logging capabilities to record all user-system interactions. This learning system will be available on a range of devices including smartphones, so that learning can be pervasive and independent of time and space. System evaluation will address both individual components as well as the complete system through a one-month study investigating the learning gains. Logs of user-system interactions will be analyzed to gain insight into the processes underlying non-native language learning. Currently, a mobile-assisted pronunciation training program has been developed and we are piloting the program with international teaching assistants at various institutions.

Title: Fairness of using different accents in Duolingo listening tasks. 2021 Duolingo Competitive Research Grants program (funded by Duolingo, 2021-2022)

There is an increasing demand for an English as a lingua franca perspective to be incorporated into international listening test materials. Primarily using a prestigious inner circle form of North American English as the sole listening stimulus may not be ecologically valid any longer, especially in the context of globalization. Scholars (Hamp-Lyons & Davies, 2008) argue that English proficiency tests should now adopt an English as an International Language approach over reference to traditionally standard varieties.  However, the inclusion of samples produced by speakers of outer and expanding circle English varieties (e.g., India, Mexico, Korea) has been largely avoided. Accordingly, the current project aims to answer an on-going validity question of whether international tests of English proficiency should or should not privilege a standard variety of English to make it fair to speakers of non-standard varieties (Hamp-Lyons & Davies, 2008; Kang et al., 2019).  More specifically, the project intends to examine (1) to what extent different or shared English accents have an impact on listeners’ performance in the Duolingo listening tests, (2) to what extent different English accents affect listeners’ performances in two different task types (i.e., ‘yes/no’ vocabulary and dictation), and (3) what listeners’ overall attitudes towards the inclusion of different English accents are in the Duolingo English Test (DET) and their associations with the listening test scores? Speakers from four distinct English varieties will be recruited to produce speech samples for the Duolingo listening tasks (i.e., “Yes/No” Vocabulary and Dictation). They will be Chinese, Spanish, English (from India) and Korean, i.e., the most frequent first languages of DET test-takers (LaFlair & Settles, 2020). Listeners who speak with the same four international English accents will be recruited to take the Duolingo listening tests. The findings of the proposed project will provide important guidance to promote the DET as a test of international English and to better understand fairness, equality, and practicality of designing and administering high-stakes English tests. 

Title: The relationship between learners’ backgrounds and proficiency of young English language learners in Mexico (funded by Alianza Inter-Universitaria Sonora-Arizon, 2019-2021).

The current study examined to what extent YLE backgrounds affect their English proficiency scores and measured fluency features. It further identified the relationships between YLE proficiency and oral performance in storytelling contexts. Fluency was operationalized as speech rates and pauses. Sixth elementary school students (i.e., largely 6th graders) participated from varying English learning backgrounds in Mexico. The YLE information included specific backgrounds (e.g., age, English exposure, parents’ income and education), individual difference (e.g., self-efficacy, test anxiety, cognitive strategy use), and general motivation to learn English. Oral communication skills were measured through story retells and examined in relation to their proficiency scores. The participants’ speech samples were analyzed for temporal features (Kang & Johnson, 2018). The research design revolved around correlation and regression approaches showing that learners’ background (44 to 62%) and individual difference (23-37%) factors explained the outcome variables. In addition, parent education along with motivation was the most significant and consistent background factors that predicted proficiency, narrative, and fluency . These results may inform policy and planning for syllabus design, teacher training, materials development, and awareness on language use outside the school context.

Title: Investigation of relationship among learner background, linguistic progression, and score gain on IELTS (funded by IELTS, 2019-2020)

This project investigated to what extent IELTS test performances (i.e., overall test scores, speaking section scores, and linguistic constructs of speaking) changed over the period of 3 months.  It further examined how learner background variables affected their linguistic progress and band score gains on the IELTS. Fifty-two Korean students, enrolled in IELTS preparation classes, participated. Participants’ proficiency levels were determined by their in-house placement test scores (i.e., roughly 16 beginners, 17 intermediate, and 19 advanced). Once participants completed the pre-test survey, they took the pre-arranged official IELTS test. Participants’ hours of study and target language use information was collected weekly. The post-survey and online interviews were conducted at the end of the 3-month period right after the official IELTS post-test. The individual long-run speaking responses from the pre- and post- tests were used for speech analysis (i.e., pronunciation and lexico-grammatical features) to examine their linguistic gains over time.  The results showed that students made various progress in English over the 3-month period with an average gain of slightly less than half a band (.3) and with the most score gain in the writing skill and the least score gain in the speaking skill. Approximately 60% of the participants gained .5 or 1 band scores.  In particular, hours of study and level of proficiency predicted the band score gains most potently. Together with the amount of target language, the background variables explained 34% of variance in the score gains. Fluency features revealed the most significant improvement over time, but complex relationships were found between learner background characteristics and speech construct changes.  Findings offer useful implications to the development of language testing and assessment as well as curriculum planning.