Phone-level speaker embedding based speaker adaptation method audio demo
 Here we provide some audio samples generated aftering different speaker embedding methods, and explain the abbreviations below
-  OracleVocode: orignal acoustic feature re-resynthesized by vocoder 
 
-  Xvec: use xvector as speaker embedding 
 
-  UttEmb: use reference audios to get utterance level speaker embedding using reference encoder
 
-  Attentron: use reference audios to get frame level speaker embedding using attention based reference encoder 
 
-  PhnEmb(proposed): use predictor to get phon level speaker embedding 
 
-  +Adapt: also updating the LSTM in decode 
 
test speaker1 `KHW`
| Text | 
- | 
- | 
- | 
| OracleVocode | 
 | 
 | 
 | 
| Xvec | 
 | 
 | 
 | 
| UttEmb | 
 | 
 | 
 | 
| Attentron | 
 | 
 | 
 | 
| PhnEmb(proposed) | 
 | 
 | 
 | 
test speaker1 `KHW` with adaptation
| Text | 
- | 
- | 
- | 
| OracleVocode | 
 | 
 | 
 | 
| Xvec+Adapt | 
 | 
 | 
 | 
| Attentron+Adapt | 
 | 
 | 
 | 
| UttEmb+Adapt | 
 | 
 | 
 | 
| PhnEmb+Adapt(proposed) | 
 | 
 | 
 | 
test speaker2 `HJX`
| Text | 
- | 
- | 
- | 
| OracleVocode | 
 | 
 | 
 | 
| Xvec | 
 | 
 | 
 | 
| UttEmb | 
 | 
 | 
 | 
| Attentron | 
 | 
 | 
 | 
| PhnEmb(proposed) | 
 | 
 | 
 | 
test speaker2 `HJX` with adaptation 
| Text | 
- | 
- | 
- | 
| OracleVocode | 
 | 
 | 
 | 
| Xvec+Adapt | 
 | 
 | 
 | 
| UttEmb+Adapt | 
 | 
 | 
 | 
| Attentron+Adapt | 
 | 
 | 
 | 
| PhnEmb+Adapt(proposed) | 
 | 
 | 
 |