EpR(Excitation plus Resonance)是一个用于构建语音频谱形状的模型。于2001年由庞培法布拉大学Music Technology Group与Yamaha Advanced System Development Center合作发表。在Vocaloid系列歌声合成软件中被用来处理音渡、改变语音特性。
Bonada, Jordi, et al. "Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models." Proceedings of International Computer Music Conference. 2001. Sanjaume, Jordi Bonada. Voice processing and synthesis by performance sampling and spectral models. Diss. Universitat Pompeu Fabra, 2008.
Serra, X. 1989. "A System for Sound Analysis/Transformation/Synthesis based on a Deterministic plus Stochastic Decomposition" Ph.D. Thesis. Stanford University.
左边稍微臃肿了点。意思是,我们给定一段语音信号(OV: Original Voice)要修改它的特性,对OV进行分析获得OV的频谱包络(OSE: Original Spectral Envelope)。 同时我们给OV拟合出它的参数(OP: Original Parameters),然后用OP生成EOE: Estimated Original Envelope。拿EOE减去OSE即获得DE。
如果我们给EpR一组新参数(NP: New Parameters),EpR内部会用NP生成ENE: Estimated New Envelope,再把ENE和DE叠加,作为最终的频谱包络SE: Spectral Envelope输出。 如果NP = OP,那么ENE = EOE,SE = DE + ENE = DE + EOE = OSE。此时合成的语音和原始语音在听觉上相等(因为忽略了相位所以仅在听觉上相等)。 2. 频谱包络的生成 EpR滤波器由三个子滤波器组成,除了上面提到的Differential Filter外,Esitmated Envelope可以拆成两个Filter的叠加: 1. Excitation: 激励。 2. Resonance: 共振。 所以才叫作Excitation plus Resonance。
论文[1]并未提及如何获得这三个参数,原文仅说“This curve is obtained from an approximation to the harmonic spectral shape (HSS) determined by the harmonics identified in the SMS analysis”。 而论文[2]中提到“Gain, Slope and SlopeDepth values are obtained from a linear regression of the harmonic peaks in the logarithmic frequency spectrum.” 我正在尝试对Excitation参数的自动提取,将在本帖最后讨论。