File preview
SaTC
PI m e eting, 2012
h kt ɔn f n ks
Learning to Read Encrypted VoIP Conversations
Fabian Monrose
NSF SaTC Meeting, 2012
Fabian Monrose
2
Voice over IP (VoIP)
• Popular replacement for traditional telephony
• Many free, or inexpensive, services available
• very reliable
• easy to use
NSF SaTC Meeting, 2012
Fabian Monrose
3
VoIP Security
• Security and privacy implications still not well understood
• Two channels: voice and control
• Majority of security analyses focus on control channel
• e.g., caller id spoofing, registration hijacking, denial of service
We are interested in the privacy of the voice channel
voice"
Internet"
control"
NSF SaTC Meeting, 2012
Fabian Monrose
4
Information leakage
Overlooked interaction of two design decisions:
COMPRESSION
• compression: variable-bit-rate (VBR) codecs
• compress different sounds with varying fidelity
• encryption: length-preserving stream ciphers
ENCRYPTION
NSF SaTC Meeting, 2012
Fabian Monrose
5
Information leakage
Result: packet sizes reflect properties of the input signal
NSF SaTC Meeting, 2012
Fabian Monrose
6
How bad is this leak?
• Sufficient to determine:
2007
2008
2009
• Wright et al.; Language identification of encrypted VoIP
traffic: Alejandra y Roberto or Alice and Bob?, USENIX Security
• Wright et al., Spot me if you can: Uncovering spoken
phrases in encrypted VoIP conversations, IEEE S&P
streams, ESORICS, 2009.
• Backes et al.; Speaker recognition in encrypted VoIP
Prior work did not take advantage of language-specific constraints or permitted sequences (i.e., “phonotactics”)
NSF SaTC Meeting, 2012
Fabian Monrose
7
• Infants use perceptual, social, and linguistic cues
to segment the stream of sounds
• use learned knowledge of well-formedness
• amazingly, infants learn these rudimentary constraints while
" " • use familiar words (e.g., their own name, mama, etc) to identify new words in a stream
Blanchard et al. Modeling the contribution of phonotactic cues to the problem of word segmentation. Journal of Child Language, 2010." Bortfeld et al. Mommy and me: Familiar names help launch babies into speech-stream segmentation. Psychological Science, 2005."
simultaneously segmenting words
NSF SaTC Meeting, 2012
Fabian Monrose
8
NSF SaTC Meeting, 2012
Fabian Monrose
9
Step 1: phonetic segmentation
4 2 3
6
4
5
4
4
6
3
3
10
3
IPA Pronunciation of the phrase “an official deadline” "
Observation: frame sizes differ in response to phoneme transitions
NSF SaTC Meeting, 2012
Fabian Monrose
10
Step 2: phoneme classification
Observation: differing sounds are encoded at different bit rates (e.g., Speex codec only uses 9 different bit rates in
narrow band mode; 21 bit rates in wide-band mode)
NSF SaTC Meeting, 2012
Fabian Monrose
11
Step 3: Word break insertion
Based on language-specific constraints on phoneme order
• insert potential word breaks into impossible phonetic triplets
"
[ɪŋw] ( blessing way )"
• resolve invalid word beginning / endings
"
[zdr] ( eavesdrop )"
• improvement: split resulting segments by dictionary search
Harrington et al. Word boundary identification from phoneme sequence constraints in
automatic continuous speech recognition. Computational Linguistics, 1988."
NSF SaTC Meeting, 2012
Fabian Monrose
12
Stage 4: Word Matching
• Find closest pronunciation using
an edit distance approach to infer articulatory distance between phonemes
Vowels characterized by tongue position and lip shape (height, backness, rounding)
Consonants characterized by restriction of airflow (place, manner)
NSF SaTC Meeting, 2012
Fabian Monrose
13
Stage 4: Word Matching
(Or, how we spent the summer of 2011)
Katherine Shaw! Elliott Moreton!
Austin Matthews!
Phonetic Edit Distance
NSF SaTC Meeting, 2012
Fabian Monrose
14
Evaluation
• 630 speakers, 8 major dialects of American English
• Score hypotheses using well-studied techniques for
modeling the adequacy and fluency of a translation
• penalizes fragmentation by matching contiguous
subsequences (i.e., fluency)
UNDERSTANDABLE
GOOD/FLUENT
.1
.2
.3
.4
.5
.6
.7
.8
.9
METEOR Score Interpretation (Lavie, 2010)
NSF SaTC Meeting, 2012
Fabian Monrose
15
Hypotheses
SA2: Don t ask me to carry an oily rag like that
Don t asked me to carry an oily rag like that
Don t ask me to carry an oily rag like dark
Context dependent results
Reference
Hypothesis
Change involves the displacement of form.
Codes involves the displacement of aim.
Artificial intelligence is for real.
Artificial intelligence is carry all.
Bitter unreasoning jealousy.
Bitter unreasoning dignity.
Context independent results
UNDERSTANDABLE GOOD/FLUENT
score
0.98
0.82
0.80
Don t asked me to carry and oily rag like dark
score
0.57
0.49
0.47
.1
.2
.3
.4
.5
.6
.7
.8
.9
METEOR Score Interpretation (Lavie, 2010)
NSF SaTC Meeting, 2012
Fabian Monrose
credit: W. Diffie, S. Landau"
16
Summary
• VoIP is here to stay. But, security and privacy issues should
not be overlooked
• quality of reconstructed transcripts better than expected
• will improve with advancements in computational linguistics
• We need stronger, interdisciplinary, partnerships in
order to design more secure and efficient solutions
See: A. White, K. Snow, A. Matthews, F. Monrose. Phonotactic Reconstruction of Encrypted VoIP Conversations: hʊkt ɔn fɒnɪks. IEEE Symposium on Security & Privacy, 2011."
NSF SaTC Meeting, 2012
Fabian Monrose
17
Ongoing Partnerships
• Closer partnership with Linguistics Department
• exploring new ways of computing phonotactic probability (w/
Elliott Moreton, Katherine Shaw, Jennifer Smith, Andrew White)
applications in Computer Security
• Linguists are interested in generating and rating new blends
; many
• Great learning experience!
• English is far more complex than I ever imagined
•
e.g., differences in written and spoken form (codas, onsets, nuclei, rhyme, etc.)
• Strikingly different lab culture and research meeting practices
PI m e eting, 2012
h kt ɔn f n ks
Learning to Read Encrypted VoIP Conversations
Fabian Monrose
NSF SaTC Meeting, 2012
Fabian Monrose
2
Voice over IP (VoIP)
• Popular replacement for traditional telephony
• Many free, or inexpensive, services available
• very reliable
• easy to use
NSF SaTC Meeting, 2012
Fabian Monrose
3
VoIP Security
• Security and privacy implications still not well understood
• Two channels: voice and control
• Majority of security analyses focus on control channel
• e.g., caller id spoofing, registration hijacking, denial of service
We are interested in the privacy of the voice channel
voice"
Internet"
control"
NSF SaTC Meeting, 2012
Fabian Monrose
4
Information leakage
Overlooked interaction of two design decisions:
COMPRESSION
• compression: variable-bit-rate (VBR) codecs
• compress different sounds with varying fidelity
• encryption: length-preserving stream ciphers
ENCRYPTION
NSF SaTC Meeting, 2012
Fabian Monrose
5
Information leakage
Result: packet sizes reflect properties of the input signal
NSF SaTC Meeting, 2012
Fabian Monrose
6
How bad is this leak?
• Sufficient to determine:
2007
2008
2009
• Wright et al.; Language identification of encrypted VoIP
traffic: Alejandra y Roberto or Alice and Bob?, USENIX Security
• Wright et al., Spot me if you can: Uncovering spoken
phrases in encrypted VoIP conversations, IEEE S&P
streams, ESORICS, 2009.
• Backes et al.; Speaker recognition in encrypted VoIP
Prior work did not take advantage of language-specific constraints or permitted sequences (i.e., “phonotactics”)
NSF SaTC Meeting, 2012
Fabian Monrose
7
• Infants use perceptual, social, and linguistic cues
to segment the stream of sounds
• use learned knowledge of well-formedness
• amazingly, infants learn these rudimentary constraints while
" " • use familiar words (e.g., their own name, mama, etc) to identify new words in a stream
Blanchard et al. Modeling the contribution of phonotactic cues to the problem of word segmentation. Journal of Child Language, 2010." Bortfeld et al. Mommy and me: Familiar names help launch babies into speech-stream segmentation. Psychological Science, 2005."
simultaneously segmenting words
NSF SaTC Meeting, 2012
Fabian Monrose
8
NSF SaTC Meeting, 2012
Fabian Monrose
9
Step 1: phonetic segmentation
4 2 3
6
4
5
4
4
6
3
3
10
3
IPA Pronunciation of the phrase “an official deadline” "
Observation: frame sizes differ in response to phoneme transitions
NSF SaTC Meeting, 2012
Fabian Monrose
10
Step 2: phoneme classification
Observation: differing sounds are encoded at different bit rates (e.g., Speex codec only uses 9 different bit rates in
narrow band mode; 21 bit rates in wide-band mode)
NSF SaTC Meeting, 2012
Fabian Monrose
11
Step 3: Word break insertion
Based on language-specific constraints on phoneme order
• insert potential word breaks into impossible phonetic triplets
"
[ɪŋw] ( blessing way )"
• resolve invalid word beginning / endings
"
[zdr] ( eavesdrop )"
• improvement: split resulting segments by dictionary search
Harrington et al. Word boundary identification from phoneme sequence constraints in
automatic continuous speech recognition. Computational Linguistics, 1988."
NSF SaTC Meeting, 2012
Fabian Monrose
12
Stage 4: Word Matching
• Find closest pronunciation using
an edit distance approach to infer articulatory distance between phonemes
Vowels characterized by tongue position and lip shape (height, backness, rounding)
Consonants characterized by restriction of airflow (place, manner)
NSF SaTC Meeting, 2012
Fabian Monrose
13
Stage 4: Word Matching
(Or, how we spent the summer of 2011)
Katherine Shaw! Elliott Moreton!
Austin Matthews!
Phonetic Edit Distance
NSF SaTC Meeting, 2012
Fabian Monrose
14
Evaluation
• 630 speakers, 8 major dialects of American English
• Score hypotheses using well-studied techniques for
modeling the adequacy and fluency of a translation
• penalizes fragmentation by matching contiguous
subsequences (i.e., fluency)
UNDERSTANDABLE
GOOD/FLUENT
.1
.2
.3
.4
.5
.6
.7
.8
.9
METEOR Score Interpretation (Lavie, 2010)
NSF SaTC Meeting, 2012
Fabian Monrose
15
Hypotheses
SA2: Don t ask me to carry an oily rag like that
Don t asked me to carry an oily rag like that
Don t ask me to carry an oily rag like dark
Context dependent results
Reference
Hypothesis
Change involves the displacement of form.
Codes involves the displacement of aim.
Artificial intelligence is for real.
Artificial intelligence is carry all.
Bitter unreasoning jealousy.
Bitter unreasoning dignity.
Context independent results
UNDERSTANDABLE GOOD/FLUENT
score
0.98
0.82
0.80
Don t asked me to carry and oily rag like dark
score
0.57
0.49
0.47
.1
.2
.3
.4
.5
.6
.7
.8
.9
METEOR Score Interpretation (Lavie, 2010)
NSF SaTC Meeting, 2012
Fabian Monrose
credit: W. Diffie, S. Landau"
16
Summary
• VoIP is here to stay. But, security and privacy issues should
not be overlooked
• quality of reconstructed transcripts better than expected
• will improve with advancements in computational linguistics
• We need stronger, interdisciplinary, partnerships in
order to design more secure and efficient solutions
See: A. White, K. Snow, A. Matthews, F. Monrose. Phonotactic Reconstruction of Encrypted VoIP Conversations: hʊkt ɔn fɒnɪks. IEEE Symposium on Security & Privacy, 2011."
NSF SaTC Meeting, 2012
Fabian Monrose
17
Ongoing Partnerships
• Closer partnership with Linguistics Department
• exploring new ways of computing phonotactic probability (w/
Elliott Moreton, Katherine Shaw, Jennifer Smith, Andrew White)
applications in Computer Security
• Linguists are interested in generating and rating new blends
; many
• Great learning experience!
• English is far more complex than I ever imagined
•
e.g., differences in written and spoken form (codas, onsets, nuclei, rhyme, etc.)
• Strikingly different lab culture and research meeting practices