The Privrec Project

Protecting Privacy in Recorded Conversations

The purpose of this project is to develop a scientific workflow that protects the privacy of individuals and entities in recorded conversations. This project was initiated by the Computer Science Master’s Degree Thesis authored by Scot Cunningham, a 2007 graduate student at Northern Kentucky University (www.nku.edu), under the advice of Dr. Traian Marius Truta, Assistant Professor of Computer Science at NKU. The full text of this thesis can be found here. Scot Cunningham is also a Senior Manager at Convergys Corporation. Scot's homepage can be found here.

The Privrec scientific workflow integrates well-known data privacy techniques with custom distortion techniques to identify and distort sensitive audio while at the same time maintaining minimal data loss in both contextual content and audio signal characteristics. The intended outcome of this research is to enable corpora that contain sensitive audio recordings and their associated transcriptions to be shared between various business and academic entities for purposes of research and development as well as for the improvement of applications that make use of speech technology.

Audio Examples

Original Audio

Distorted Audio

These audio examples demonstrate the results of the distortion techniques on an audio file. Only the word “Chicago” is distorted. Note that the intent here was to distort the audio so that an automatic recognizer is incapable of interpreting the selected audio segment. In some cases, the distortion needed to thwart the recognizer resulted in audio that is still easily discernable to human ears as is evident by the following example:

Distorted Audio

Distortion intended to thwart human recognition is an area of further study. Human intelligibility of encrypted speech using various techniques has been an area of study for decades. In “A Comparison of Four Methods for Analog Speech Privacy” (IEEE Transactions on Communications, Volume 29, Issue 1, Jan 1981 pp. 18 - 23), several techniques for speech encryption are compared for their ability to thwart human intelligibility. The results of three of these techniques on our audio samples are provided below.

Block Permutation

Frequency Inversion

Frequency Inversion Plus Block Permutation

Although these techniques distort rather well, they incur high levels of prosody loss or are insecure, both of which go against the goals of this work. Efforts to refine techniques like these in order to preserve prosody would be an excellent area of further research.

Here we also include more examples of the "Chicago" utterance with more of the lower-level signals suppressed:

WaveSurfer Snapshots

The following WaveSurfer snapshots show the preservation of prosodic features in the distorted audio using the custom distortion technique..

Waveform and Pitch Contour of Original Audio

Waveform and Pitch Contour of Distorted Audio

Links to third party software used in this project

Praat and WaveSurfer are both used for audio editing and analysis.

CMU Sphinx Group Open Source Speech Recognition Engines used for automated speech recognition.

CMU Sphinx Knowledge Base Tool and CMU Statistical Modeling Toolkit are used for building language models

LDC sph2pipe is used to convert sphere header format audio (.sph) to wav format.

MySQL Open Source Database is used as a relational database.

NIST Spoken Language Technology Evaluation and Utility Speech Recognition Scoring Toolkit (SCTK) is used for calculating word error rates.

ESPS from KTH. ESPS includes the get_f0 program used by compparef0.sh to measure pitch and energy of an audio file.

Sphinx 3 configuration used by this project

privrec.tar.gz

Custom software developed as part of this project

(see comments in each file for descriptions)

Audio samples from the CU Communicator Corpus used on this site are provided with the written permission of Colorado University - Boulder, CSLR Group, who owns the CU Communicator Corpus.