Predicting couple therapy outcomes based on speech acoustic features
IntroductionBehavioral Signal Processing (BSP) [ 1, 2] refers to computational methods that support measurement, analysis, and modeling of human behavior and interactions. The main goal is to support decision making of domain experts, such as mental health researchers and clinicians. BSP maps real-world signals to behavioral constructs, often abstract and complex, and has been applied in a variety of clinical domains including couples therapy [ 1, 3, 4], Autism Spectrum Disorder [ 5], and addiction counseling [ 6, 7]. Parallel work with focus on social context rather than the health domains can be found in [ 8, 9]. Notably, couple therapy has been among one of the key application domains of Behavioral Signal Processing. There have been significant efforts in characterizing the behavior of individuals engaged in conversation with their spouses during problem-solving interaction sessions. Researchers have explored information gathered from various modalities such as vocal patterns of speech [ 3, 4, 10, 11], spoken language use [ 1, 12] and visual body gestures [ 13]. These studies are promising towards the creation of automated support systems for psychotherapists in creating objective measures for diagnostics, intervention assessment and planning. This entails not only characterizing and understanding a range of clinically meaningful behavior traits and patterns but, critically, also measure behavior change in response to treatment. A systematic and objective study and monitoring of the outcome relevant to the respective condition can facilitate positive and personalized interventions. In particular, in clinical psychology, predicting (or measuring from couple interactions, without couple, or therapist provided metrics) the outcome of the relationship of a couple undergoing counseling has been a subject of long-standing interest [ 14– 16].
Many previous studies have manually investigated what behavioral traits and patterns of a couple can tell us of their relationship outcome, for example, whether a couple could successfully recover from their marital conflict or not. Often the monitoring of outcomes involves a prolonged period of time post treatment (up to 5 years), and highly subjective self reporting and manual observational coding [ 17]. Such an approach suffers from the inherent limitations of the qualitative observational assessment, subjective biases of the experts, and great variability in the self-reporting of behavior by the couples. Having a computational framework for outcome prediction can be beneficial towards assessment of the employed therapy strategies and the quality of treatment, and also help provide feedback to the experts.
In this article, we analyze the vocal speech patterns of couples engaged in problem-solving interactions to infer the eventual outcome of their relationship—whether it improves or not–over the course of therapy. The proposed data-driven approach focuses primarily on the acoustics of the interaction; unobtrusively-obtainable, and known to offer rich behavioral information. We adopt well-established speech signal processing techniques, in conjunction with novel data representations inspired by psychological theories to design the computational scheme for the therapy outcome prediction considered. We formulate the outcome prediction as binary (improvement vs. no improvement) and multiclass (different levels of improvement) classification problems and use machine learning techniques to automatically discern the underlying patterns of these classes from the speech signal.
We compare the prediction using features directly derived from speech with prediction using clinically relevant behavioral ratings (e.g., relationship satisfaction, blame patterns, negativity) manually coded by experts after observing the interactions. It should be noted that human behavioral codes are based on watching videos of interactions that provide access to additional information beyond vocal patterns (solely relied by the proposed prediction scheme) including language use and visual nonverbal cues.
In addition to evaluating how well directly signal-derived acoustic features compare with manually derived behavioral codes as features for prediction, we also evaluate the prediction of the outcome when both feature streams are used together.
We also investigate the benefit of explicitly accounting for the dynamics and mutual influence of the dyadic behavior during towards the prediction task. The experimental results show that dynamic functionals that measure relative vocal changes within and across interlocutors contribute to improved outcome prediction.
The outline of the paper is as follows. We discuss relevant literature in Section 1. The Couple Therapy Corpus used in the study is described in Section 1 and illustrated in Fig 1. An overview of the methodologies for speech acoustic feature extraction is given in Section 1 and the use of behavioral codes as features is described in Section 1. We provide an analysis of the proposed acoustic features in Section 1 and the results of the classification experiments in Section 1. Finally, we conclude the paper with a discussion of our findings as well as possible directions for future research in Section 1.