Phone-level speaker embedding based speaker adaptation method audio demo
Here we provide some audio samples generated aftering different speaker embedding methods, and explain the abbreviations below
- OracleVocode: orignal acoustic feature re-resynthesized by vocoder
- Xvec: use xvector as speaker embedding
- UttEmb: use reference audios to get utterance level speaker embedding using reference encoder
- Attentron: use reference audios to get frame level speaker embedding using attention based reference encoder
- PhnEmb(proposed): use predictor to get phon level speaker embedding
- +Adapt: also updating the LSTM in decode
test speaker1 `KHW`
| Text |
- |
- |
- |
| OracleVocode |
|
|
|
| Xvec |
|
|
|
| UttEmb |
|
|
|
| Attentron |
|
|
|
| PhnEmb(proposed) |
|
|
|
test speaker1 `KHW` with adaptation
| Text |
- |
- |
- |
| OracleVocode |
|
|
|
| Xvec+Adapt |
|
|
|
| Attentron+Adapt |
|
|
|
| UttEmb+Adapt |
|
|
|
| PhnEmb+Adapt(proposed) |
|
|
|
test speaker2 `HJX`
| Text |
- |
- |
- |
| OracleVocode |
|
|
|
| Xvec |
|
|
|
| UttEmb |
|
|
|
| Attentron |
|
|
|
| PhnEmb(proposed) |
|
|
|
test speaker2 `HJX` with adaptation
| Text |
- |
- |
- |
| OracleVocode |
|
|
|
| Xvec+Adapt |
|
|
|
| UttEmb+Adapt |
|
|
|
| Attentron+Adapt |
|
|
|
| PhnEmb+Adapt(proposed) |
|
|
|