File preview

SaTC

PI m e eting, 2012

h kt ɔn f n ks
Learning to Read Encrypted VoIP Conversations
Fabian Monrose

NSF SaTC Meeting, 2012

Fabian Monrose

2

Voice over IP (VoIP)
•  Popular replacement for traditional telephony
•  Many free, or inexpensive, services available
• very reliable
• easy to use

NSF SaTC Meeting, 2012

Fabian Monrose

3

VoIP Security
•  Security and privacy implications still not well understood
•  Two channels: voice and control
•  Majority of security analyses focus on control channel
• e.g., caller id spooﬁng, registration hijacking, denial of service
We are interested in the privacy of the voice channel
voice"
Internet"

control"

NSF SaTC Meeting, 2012

Fabian Monrose

4

Information leakage
Overlooked interaction of two design decisions:
COMPRESSION

•  compression: variable-bit-rate (VBR) codecs
•  compress different sounds with varying ﬁdelity
•  encryption: length-preserving stream ciphers

ENCRYPTION

NSF SaTC Meeting, 2012

Fabian Monrose

5

Information leakage

Result: packet sizes reﬂect properties of the input signal

NSF SaTC Meeting, 2012

Fabian Monrose

6

How bad is this leak?
• Sufﬁcient to determine:
2007
2008
2009
•  Wright et al.; Language identiﬁcation of encrypted VoIP
trafﬁc: Alejandra y Roberto or Alice and Bob?, USENIX Security

•  Wright et al., Spot me if you can: Uncovering spoken
phrases in encrypted VoIP conversations, IEEE S&P
streams, ESORICS, 2009.

•  Backes et al.; Speaker recognition in encrypted VoIP

Prior work did not take advantage of language-speciﬁc constraints or permitted sequences (i.e., “phonotactics”)

NSF SaTC Meeting, 2012

Fabian Monrose

7

• Infants use perceptual, social, and linguistic cues
to segment the stream of sounds

•  use learned knowledge of well-formedness
• amazingly, infants learn these rudimentary constraints while

" " •  use familiar words (e.g., their own name, mama, etc) to identify new words in a stream
Blanchard et al. Modeling the contribution of phonotactic cues to the problem of word segmentation. Journal of Child Language, 2010." Bortfeld et al. Mommy and me: Familiar names help launch babies into speech-stream segmentation. Psychological Science, 2005."

simultaneously segmenting words

NSF SaTC Meeting, 2012

Fabian Monrose

8

NSF SaTC Meeting, 2012

Fabian Monrose

9

Step 1: phonetic segmentation
  

  

4 2 3

6

4

5

4

4

6

3

3

10

3

  





















IPA Pronunciation of the phrase “an ofﬁcial deadline” "

Observation: frame sizes differ in response to phoneme transitions

NSF SaTC Meeting, 2012

Fabian Monrose

10

Step 2: phoneme classiﬁcation

Observation: differing sounds are encoded at different bit rates (e.g., Speex codec only uses 9 different bit rates in
narrow band mode; 21 bit rates in wide-band mode)

NSF SaTC Meeting, 2012

Fabian Monrose

11

Step 3: Word break insertion
Based on language-speciﬁc constraints on phoneme order
• insert potential word breaks into impossible phonetic triplets
"

[ɪŋw] ( blessing way )"

• resolve invalid word beginning / endings
"

[zdr] ( eavesdrop )"

• improvement: split resulting segments by dictionary search
Harrington et al. Word boundary identiﬁcation from phoneme sequence constraints in
automatic continuous speech recognition. Computational Linguistics, 1988."

NSF SaTC Meeting, 2012

Fabian Monrose

12

Stage 4: Word Matching
• Find closest pronunciation using
an edit distance approach to infer articulatory distance between phonemes

Vowels characterized by tongue position and lip shape (height, backness, rounding)
Consonants characterized by restriction of airﬂow (place, manner)

NSF SaTC Meeting, 2012

Fabian Monrose

13

Stage 4: Word Matching
(Or, how we spent the summer of 2011)
Katherine Shaw! Elliott Moreton!

Austin Matthews!

Phonetic Edit Distance

NSF SaTC Meeting, 2012

Fabian Monrose

14

Evaluation
•  630 speakers, 8 major dialects of American English
• Score hypotheses using well-studied techniques for
modeling the adequacy and ﬂuency of a translation

•  penalizes fragmentation by matching contiguous
subsequences (i.e., ﬂuency)

UNDERSTANDABLE

GOOD/FLUENT

.1

.2

.3

.4

.5

.6

.7

.8

.9

METEOR Score Interpretation (Lavie, 2010)

NSF SaTC Meeting, 2012

Fabian Monrose

15

Hypotheses
SA2: Don t ask me to carry an oily rag like that
Don t asked me to carry an oily rag like that
Don t ask me to carry an oily rag like dark
Context dependent results
Reference
Hypothesis
Change involves the displacement of form.
Codes involves the displacement of aim.
Artiﬁcial intelligence is for real.
Artiﬁcial intelligence is carry all.
Bitter unreasoning jealousy.
Bitter unreasoning dignity.
Context independent results
UNDERSTANDABLE GOOD/FLUENT

score
0.98
0.82
0.80

Don t asked me to carry and oily rag like dark

score

0.57
0.49
0.47

.1

.2

.3

.4

.5

.6

.7

.8

.9

METEOR Score Interpretation (Lavie, 2010)

NSF SaTC Meeting, 2012

Fabian Monrose
credit: W. Difﬁe, S. Landau"

16

Summary
•  VoIP is here to stay. But, security and privacy issues should
not be overlooked

• quality of reconstructed transcripts better than expected
• will improve with advancements in computational linguistics
•  We need stronger, interdisciplinary, partnerships in
order to design more secure and efﬁcient solutions
See: A. White, K. Snow, A. Matthews, F. Monrose. Phonotactic Reconstruction of Encrypted VoIP Conversations: hʊkt ɔn fɒnɪks. IEEE Symposium on Security & Privacy, 2011."

NSF SaTC Meeting, 2012

Fabian Monrose

17

Ongoing Partnerships
•  Closer partnership with Linguistics Department
• exploring new ways of computing phonotactic probability (w/
Elliott Moreton, Katherine Shaw, Jennifer Smith, Andrew White)
applications in Computer Security

• Linguists are interested in generating and rating new blends

; many

•  Great learning experience!
• English is far more complex than I ever imagined
• 
e.g., differences in written and spoken form (codas, onsets, nuclei, rhyme, etc.)

• Strikingly different lab culture and research meeting practices