6.3.6 HRLE

6.3.6.1 Outline of the node

This node estimates the stationary noise level using the Histogram-based Recursive Level Estimation (HRLE) method. HRLE calculates histograms (frequency distribution) of input spectra and estimates a noise level from the normalization accumulation frequency designated by the cumulative distribution and parameter $Lx$. A histogram is calculated with a previous input spectrum weighted with an exponent window, and the position of the exponent window is updated every frame.

6.3.6.2 Necessary files

No files are required.

6.3.6.3 Usage

When to use

This node is used when within to suppress noise using spectrum subtraction.

Typical connection

As shown in Figure 6.46, the input is connected after separation nodes such as GHDSS and the output is connected to the nodes that calculate an optimal gain such as CalcSpecSubGain . Figure 6.47 shows a connection example when EstimateLeak is used together.

\includegraphics[width=.95\textwidth ]{fig/modules/HRLE1}
Figure 6.46: Connection example of HRLE 1
\includegraphics[width=.95\textwidth ]{fig/modules/HRLE2}
Figure 6.47: Connection example of HRLE 2

6.3.6.4 Input-output and property of the node

Table 6.38: Parameter list of HRLE

Parameter name

Type

Default value

Unit

Description

LX

float 

0.85

 

Normalization accumulation frequency ($Lx$ value).

TIME_CONSTANT

float 

16000

[pt]

Time constant.

NUM_BIN

float 

1000

 

Number of bins of a histogram.

MIN_LEVEL

float 

-100

[dB]

The minimum level of a histogram.

STEP_LEVEL

float 

0.2

[dB]

Width of a histogram bin.

DEBUG

bool 

false

 

Debugging mode.

Input

INPUT_SPEC

: Map<int, float> type. Power spectrum of input signal

Output

NOISE_SPEC

: Map<int, float> type. Power spectrum of estimated noise

Parameters

LX

: float type. The default value is 0.85. Normalization accumulation frequency on an accumulation frequency distribution is designated in the range from 0 to 1. When designating 0, the minimum level is estimated. When designating 1, the maximum level is estimated. Median is estimated when 0.5 is designated.

TIME_CONSTANT

: float type. The default value is 16000. A time constant (more than zero) is designated in time sample unit.

NUM_BIN

: float type. The default value is 1000. Designate the number of bins of a histogram.

MIN_LEVEL

: float type. The default value is -100. Designate the minimum level of a histogram in dB.

STEP_LEVEL

: float type. The default value is 0.2. Designate a width of a bin of a histogram in dB.

DEBUG

: bool The default value is false. Designate the debugging mode. In the case of the debugging mode (true), values of the cumulative histogram are output once every 100 frames as a standard output in the comma-separated text file format. Output values are in the complex matrix value format with multiple rows and columns The rows indicate positions of frequency bins and columns indicate positions of histograms. Each element indicates the complex values separated with parenthesis (right side is for real numbers and left side is for imaginaries). (Since the cumulative histogram is expressed with real numbers, and imaginary parts are usually 0. However, it does not necessarily mean that it will be 0 in future versions.) The additional value of a cumulative histogram for one sample is not 1 and they increase exponentially (for speedup). Therefore, note that cumulative histogram values do not indicate accumulation frequency itself. Most of the cumulative histogram values in each row are 0. When values are contained only in the positions that are close to the last column, the input values are great, exceeding the level range of the set histogram (overflow status). Therefore, part or all of NUM_BIN, MIN_LEVEL and STEP_LEVEL must be set to high values. On the other hand, when most of the cumulative histogram values of each row are constant values and different low values are contained only in the positions that are close to the first column, the input values are small below the level range of the set histogram (underflow status). Therefore, MIN_LEVEL must be set to low values. Example of the output:

---------- Compmat.disp()
----------
[(1.00005e-18,0), (1.00005e-18,0), (1.00005e-18,0), ...
, (1.00005e-18,0);
(0,0), (0,0), (0,0), ...
, (4.00084e-18,0);
...
(4.00084e-18,0), (4.00084e-18,0), (4.00084e-18,0), .., , (4.00084e-18,0)]
^T
Matrix size = 1000 x 257

6.3.6.5 Details of the node

Figure 6.48 shows a processing flow of HRLE. HRLE obtains a level histogram from the input power and estimates the $Lx$ level from the cumulative distribution. The $Lx$ level, as shown in Figure 6.49, is the level that normalization accumulation frequency in an accumulation frequency distribution becomes $x$. $x$ is a parameter. For example, when $x=0$, the minimum value is estimated, when $x=1$, maximum value is estimated and when $x=0.5$, a median is estimated in its processing.

\includegraphics[width=0.5\columnwidth ]{fig/modules/HRLE_flow.eps}
Figure 6.48: Processing flow of HRLE 
\includegraphics[width=0.5\columnwidth ]{fig/modules/lxhist.eps}
Figure 6.49: Estimation of $Lx$ value

The details of the processing in HRLE are expressed by the following seven equations (corresponding to Figure 6.48). In the equations, $t$ indicates time (frame), $y_ p$ indicates input power (INPUT_SPEC) and $n_ p$ indicates estimated noise power (NOISE_SPEC]). $x$, $\alpha $, $L_{min}$ and $L_{step}$ are the parameters related to histograms and indicate normalization accumulation frequency (LX), time constant (TIME_CONSTANT), the minimum level (MIN_LEVEL) of a bin, and a level width (STEP_LEVEL) of a bin, respectively. $\lfloor a \rfloor $ indicates an integer most close to $a$ below $a$. Moreover, all variables except the parameters are functions of frequency and the same processing is performed independently for every frequency. In the equations, frequency is abbreviated for simplification.

  $\displaystyle Y_ L(t) $ $\displaystyle = $ $\displaystyle 10 \log _{10} y_ p(t), \label{eqn:revcon1} $   (48)
  $\displaystyle I_ y(t) $ $\displaystyle = $ $\displaystyle \lfloor (Y_ L(t)- L_{min})/ L_{step} \rfloor , \label{eqn:revcon2} $   (49)
  $\displaystyle N(t, l) $ $\displaystyle = $ $\displaystyle \alpha N(t-1, l)+ (1 - \alpha )\delta (l - I_ y(t)), \label{eqn:hitso} $   (50)
  $\displaystyle S(t, l) $ $\displaystyle = $ $\displaystyle \sum _{k=0}^ l N(t, k), \label{eqn:cumul} $   (51)
  $\displaystyle I_ x(t) $ $\displaystyle = $ $\displaystyle \mathop {\rm argmin}_ I \left[ S(t, I_{max}) \frac{x}{100} - S(t, I) \right], \label{eqn:search} $   (52)
  $\displaystyle L_ x(t) $ $\displaystyle = $ $\displaystyle L_{min} + L_{step} \cdot I_ x(t), $   (53)
  $\displaystyle n_ p(t) $ $\displaystyle = $ $\displaystyle 10^{L_ x(t)/10} $   (54)

6.3.6.6 References

(1) H.Nakajima, G. Ince, K. Nakadai and Y. Hasegawa: “An Easily-configurable Robot Audition System using Histogram-based Recursive Level Estimation”, Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2010 (to be appeared).