pyrato.parameters#

Sub-module implementing calculations of room acoustic parameters from simulated or experimental data.

Functions:

clarity(energy_decay_curve[, early_time_limit])

Calculate the clarity from the energy decay curve (EDC).

definition(energy_decay_curve[, ...])

Calculate the definition from the energy decay curve (EDC).

early_lateral_energy_fraction(...)

Calculate the early lateral energy fraction.

late_lateral_sound_level(...)

Calculate the late lateral sound level.

modulation_transfer_function(rir[, ...])

Compute the modulation transfer function (MTF) of an impulse response according to IEC 60268-16:2020.

reverberation_time_linear_regression(...[, ...])

Estimate the reverberation time from a given energy decay curve.

sound_strength(energy_decay_curve_room, ...)

Calculate the room-acoustic strength parameter (\(G\)).

speech_transmission_index_indirect(rir[, ...])

Computes the Speech Transmission Index (STI) according to IEC 60268-16:2020 using the indirect method.

pyrato.parameters.clarity(energy_decay_curve, early_time_limit=80)[source]#

Calculate the clarity from the energy decay curve (EDC).

The clarity parameter (C50 or C80) is defined as the ratio of early-to-late arriving energy in an impulse response and is a measure for how clearly speech or music can be perceived in a room. The early-to-late boundary is typically set at 50 ms (C50) or 80 ms (C80) [1].

Clarity is calculated as:

\[C_{t_e} = 10 \log_{10} \frac{ \displaystyle \int_0^{t_e} p^2(t) \, dt }{ \displaystyle \int_{t_e}^{\infty} p^2(t) \, dt }\]

where \(t_e\) is the early time limit and \(p(t)\) is the pressure of a room impulse response. Here, the clarity is efficiently computed from the EDC \(e(t)\)

\[C_{t_e} = 10 \log_{10} \frac{e(0) - e(t_e)}{e(t_e) - e(\infty)} = 10 \log_{10} \left( \frac{e(0)}{e(t_e)} - 1 \right),\]

where \(e(\infty) = 0\) by definition of the EDC.

Parameters:
  • energy_decay_curve (pyfar.TimeData) – Energy decay curve (EDC) of the room impulse response (time-domain signal). The EDC must start at time zero.

  • early_time_limit (float, optional) – Early time limit (\(t_e\)) in milliseconds. Defaults to 80 (C80). Typical values are 50 ms (C50) or 80 ms (C80) [1].

Returns:

clarity – Clarity index (early-to-late energy ratio) in decibels, shaped according to the channel shape of the input EDC.

Return type:

numpy.ndarray[float]

References

Examples

Estimate the clarity from a real room impulse response filtered in octave bands:

>>> import pyfar as pf
>>> import pyrato as ra
...
>>> rir = pf.signals.files.room_impulse_response(sampling_rate=44100)
>>> rir = pf.dsp.filter.fractional_octave_bands(rir, num_fractions=1)
>>> edc = ra.edc.energy_decay_curve_lundeby(rir)
...
>>> C80 = ra.parameters.clarity(edc, early_time_limit=80)
>>> C80
...     [[-55.57140506]
...     [-11.75657677]
...     [ -3.21150787]
...     [  2.76276817]
...     [  4.70786211]
...     [  5.98148157]
...     [  9.66764094]
...     [  9.08687417]
...     [ 14.14550646]
...     [ 21.60048332]]
pyrato.parameters.definition(energy_decay_curve, early_time_limit=50)[source]#

Calculate the definition from the energy decay curve (EDC).

The definition parameter (D50) is defined as the ratio of early-to-total arriving energy in an impulse response and is a measure for how defined speech or music can be perceived in a room. The early-to-total boundary is typically set at 50 ms (D50) [2].

Definition is calculated as:

\[D_{t_\mathrm{e}} = \frac{ \displaystyle \int_0^{t_\mathrm{e}} p^2(t) \, dt }{ \displaystyle \int_{0}^{\infty} p^2(t) \, dt }\]

where \(t_e\) is the early time limit and \(p(t)\) is the pressure of a room impulse response. Here, the definition is efficiently computed from the EDC \(e(t)\) directly by:

\[D_{t_\mathrm{e}} = \frac{e(0) - e(t_\mathrm{e})}{e(0) - e(\infty)} = 1 - \left( \frac{e(t_\mathrm{e})}{e(0)} \right),\]

where \(e(\infty) = 0\) by definition of the EDC.

Parameters:
  • energy_decay_curve (pyfar.TimeData) – Energy decay curve (EDC) of the room impulse response (time-domain signal). The EDC must start at time zero.

  • early_time_limit (float, optional) – Early time limit (\(t_\mathrm{e}\)) in milliseconds. Defaults to typical value 50 (D50) [2].

Returns:

definition – Definition index (early-to-total energy ratio), shaped according to the channel shape of the input EDC.

Return type:

numpy.ndarray[float]

References

Examples

Estimate the defintion from a real room impulse response filtered in octave bands:

>>> import pyfar as pf
>>> import pyrato
...
>>> rir = pf.signals.files.room_impulse_response(sampling_rate=44100)
>>> rir = pf.dsp.filter.fractional_octave_bands(
>>>     rir, num_fractions=1, frequency_range=(125, 20e3))
>>> edc = pyrato.edc.energy_decay_curve_lundeby(rir)
...
>>> D50 = pyrato.parameters.definition(edc, early_time_limit=50)
>>> D50
...     [[0.25984852]
...     [0.50208742]
...     [0.66722359]
...     [0.73528532]
...     [0.87801455]
...     [0.82757594]
...     [0.86536142]
...     [0.87374988]]
pyrato.parameters.early_lateral_energy_fraction(energy_decay_curve_omni, energy_decay_curve_lateral)[source]#

Calculate the early lateral energy fraction.

The early lateral energy fraction \(J_\mathrm{LF}\) according to [3] is defined as the ratio between the lateral sound energy captured with a figure of eight microphone arriving between 5 ms and 80 ms and the total sound energy captured with an omnidirectional microphone arriving within the first 80 ms after the direct sound. It is a measure of the apparent source width.

The parameter is defined as

\[J_\mathrm{LF} = \frac{ \displaystyle \int_{0.005}^{0.08} p_\mathrm{L}^2(t)\,\mathrm{d}t }{ \displaystyle \int_{0}^{0.08} p^2(t)\,\mathrm{d}t }\]

where \(p_\mathrm{L}(t)\) is the lateral sound pressure measured with a figure-eight microphone whose zero axis is oriented towards the source, and \(p(t)\) is the sound pressure measured at the same position with an omnidirectional microphone.

Using the energy decay curves of the omnidirectional response \(e(t)\) and the lateral response \(e_\mathrm{L}(t)\), the parameter can be computed efficiently as

\[J_\mathrm{LF} = \frac{ e_\mathrm{L}(0.005) - e_\mathrm{L}(0.08) }{ e(0) - e(0.08) }.\]
Parameters:
  • energy_decay_curve_omni (pyfar.TimeData) – Energy decay curve of the room impulse response measured with an omnidirectional microphone. The EDC must start at time zero.

  • energy_decay_curve_lateral (pyfar.TimeData) – Energy decay curve of the room impulse response measured with a figure-eight microphone oriented according to [3] (zero axis pointing towards the source). The EDC must start at time zero. Both EDCs must have identical signal.cshape.

Returns:

Early Lateral Energy Fraction – Early lateral energy fraction (\(J_\mathrm{LF}\)), shaped according to the channel shape of the input EDCs.

Return type:

numpy.ndarray

References

pyrato.parameters.late_lateral_sound_level(energy_decay_curve_free_field, energy_decay_curve_room_lateral)[source]#

Calculate the late lateral sound level.

The late lateral sound level \(L_\mathrm{J}\) quantifies the strength of late-arriving lateral sound energy. According to ISO 3382-1 [4], it is defined as the level ratio between the late lateral sound energy captured with a figure-of-eight microphone and the total sound energy of a reference impulse response measured with an omnidirectional microphone at a distance of 10 m in the free field. It is a measure of listener envelopment.

The parameter is defined as

\[L_\mathrm{J} = 10 \log_{10} \frac{ \displaystyle \int_{0.08}^{\infty} p_\mathrm{L}^2(t)\,\mathrm{d}t }{ \displaystyle \int_{0}^{\infty} p_{10}^2(t)\,\mathrm{d}t }\]

where \(p_\mathrm{L}(t)\) is the lateral sound pressure measured with a figure-eight microphone whose zero axis is oriented towards the source, and \(p_{10}(t)\) is the instantaneous sound pressure of the impulse response measured with an omnidirectional microphone at 10 m distance in the free field.

Using the energy decay curves of the reference response \(e_{10}(t)\) and the lateral response \(e_\mathrm{L}(t)\), the parameter can be computed efficiently as

\[L_\mathrm{J} = 10 \log_{10} \frac{ e_\mathrm{L}(0.08) }{ e_{10}(0) }.\]
Parameters:
  • energy_decay_curve_free_field (pyfar.TimeData) – Energy decay curve of the reference free field impulse response measured vwith an omnidirectional microphone at 10 m distance in the free field. The EDC must start at time zero.

  • energy_decay_curve_room_lateral (pyfar.TimeData) –

    Energy decay curve of the room impulse response measured with a figure-eight microphone oriented according to [4] (zero axis pointing towards the source). The EDC must start at time zero.

    Both EDCs must have identical signal.cshape.

Returns:

Late Lateral Sound Level – Late lateral sound level (\(L_\mathrm{J}\)) in decibels, shaped according to the channel shape of the input EDCs.

Return type:

numpy.ndarray

References

pyrato.parameters.modulation_transfer_function(rir, rir_type='acoustical', level=None, snr=inf, ambient_noise_correction=True)[source]#

Compute the modulation transfer function (MTF) of an impulse response according to IEC 60268-16:2020.

The MTF describes the reduction of modulation depth caused by the transmission path. It is evaluated for 7 octave bands (125 Hz–8 kHz) and 14 modulation frequencies (0.63 Hz–12.5 Hz) and forms the basis of the Speech Transmission Index (STI).

Parameters:
  • rir (pyfar.Signal) – Single-channel room impulse response with rir.cshape = (1, ). The room impulse response must be at least 1.6 seconds long.

  • rir_type ({'electrical', 'acoustical'}, optional) – Determines whether input signals given by rir were obtained acoustically or electrically. Default is 'acoustical'. Auditory masking effects are only applied for acoustical signals [5], section A.3.1.

  • level (numpy.ndarray or None, optional) – Test signal level without noise in dB SPL, given per octave band (125 Hz–8 kHz). Shape must be (7,) (7 octave bands: 125 Hz–8 kHz). If None is provided, auditory and ambient noise corrections are omitted. Default is None. See [5], section A.3.2.

  • snr (numpy.ndarray or float, optional) – Signal-to-noise ratio in dB for each octave band (125 Hz–8 kHz). Shape must be (7,) (7 octave bands: 125 Hz–8 kHz). Default is np.inf (no ambient noise). See [5], section 3.

  • ambient_noise_correction (bool, optional) – Apply ambient noise correction according to [5], Annex A.2.3. Default is True. Only applied when level is not None and ambient_noise_correction is True.

Returns:

mtf – Modulation transfer function with shape (7, 14).

Return type:

numpy.ndarray

Notes

pyfar uses octave-band filters of order 14 and the filter order influences the MTF. Higher filter orders produce steeper roll-off and a more ideal band separation, which affects the energy distribution within each octave band and thus the computed modulation depth. The influence on the broadband STI is negligible, since individual deviations in the MTF tend to average out in the weighted sum over octave bands and modulation frequencies.

References

pyrato.parameters.reverberation_time_linear_regression(energy_decay_curve, T='T20', return_intercept=False)[source]#

Estimate the reverberation time from a given energy decay curve.

The linear regression is performed using least squares error minimization according to the ISO standard 3382 [6].

Parameters:
  • energy_decay_curve (pyfar.TimeData) – Energy decay curve.

  • T ('T15', 'T20', 'T30', 'T40', 'T50', 'T60', 'EDT', 'LDT') – Decay interval to be used for the reverberation time extrapolation. EDT corresponds to the early decay time extrapolated from the interval [0, -10] dB, LDT corresponds to the late decay time extrapolated from the interval [-25, -35] dB.

  • return_intercept (bool) – If True, the function returns the intercept of the linear regression, which corresponds to the amplitude of the energy decay curve on a linear scale. The default is False.

Returns:

reverberation_time – The reverberation time

Return type:

double

References

Examples

Estimate the reverberation time from an energy decay curve.

>>> import numpy as np
>>> import pyfar as pf
>>> import pyrato as ra
>>> from pyrato.analytic import rectangular_room_rigid_walls
...
>>> L = np.array([8, 5, 3])/10
>>> source_pos = np.array([5, 3, 1.2])/10
>>> receiver_pos = np.array([1, 1, 1.2])/10
>>> rir, _ = rectangular_room_rigid_walls(
...     L, source_pos, receiver_pos,
...     reverberation_time=1, max_freq=1.5e3, n_samples=2**12,
...     speed_of_sound=343.9, samplingrate=3e3)
>>> rir = rir/rir.time.max()
...
>>> awgn = pf.signals.noise(
...     rir.n_samples, rms=10**(-50/20),
...     sampling_rate=rir.sampling_rate)
>>> rir = rir + awgn
...
>>> edc = ra.energy_decay_curve_chu_lundeby(rir)
>>> t_20 = ra.parameters.reverberation_time_linear_regression(edc, 'T20')
>>> t_20
...     array([0.99526253])
pyrato.parameters.sound_strength(energy_decay_curve_room, energy_decay_curve_free_field)[source]#

Calculate the room-acoustic strength parameter (\(G\)).

The strength parameter (\(G\)) is defined as the ratio between the total arriving sound energy and the total arriving sound energy of a reference free-field response measured at 10 m with the same source. It is a measure of the room-induced level amplification at the receiver position [7].

The parameter is defined as

\[G = 10 \log_{10} \frac{ \displaystyle \int_{0}^{\infty} p^2(t)\,dt }{ \displaystyle \int_{0}^{\infty} p_\mathrm{10}^2(t)\,dt }\]

where \(p(t)\) is the room sound pressure and \(p_\mathrm{10}(t)\) is the reference free-field sound pressure at 10 m measured with the same loudspeaker.

Using the energy decay curves of the room response \(e(t)\) and the reference response \(e_\mathrm{10}(t)\), the parameter can be computed efficiently as

\[G = 10 \log_{10} \frac{ e(0) - e(\infty) }{ e_\mathrm{10}(0) - e_\mathrm{10}(\infty) }.\]
Parameters:
  • energy_decay_curve_room (pyfar.TimeData) – Energy decay curve of the room impulse response. The EDC must start at time zero.

  • energy_decay_curve_free_field (pyfar.TimeData) – Energy decay curve of the reference free-field impulse response at 10 m. The EDC must start at time zero. Both EDCs must have identical signal.cshape.

Returns:

strength – Strength parameter (\(G\)) in decibels, shaped according to the channel shape of the input EDC.

Return type:

numpy.ndarray

References

pyrato.parameters.speech_transmission_index_indirect(rir, rir_type='acoustical', level=None, snr=inf, ambient_noise_correction=True)[source]#

Computes the Speech Transmission Index (STI) according to IEC 60268-16:2020 using the indirect method.

The STI is a scalar measure between 0 (bad) and 1 (excellent) describing speech intelligibility. It is computed from the modulation_transfer_function, optionally including auditory masking and ambient noise effects.

STI considers 7 octave bands from 125 Hz to 8 kHz and 14 modulation frequencies between 0.63 Hz and 12.5 Hz [8].

Parameters:
  • rir (pyfar.Signal) – Single or multi-channel room impulse response for which the STI is computed. The room impulse response must be at least 1.6 seconds long. See [8], Section 6.2.

  • rir_type ('electrical', 'acoustical') – Determines whether input signals given by rir were obtained acoustically or electrically. Default is 'acoustical'. Auditory masking effects are only applied for acoustical signals [8], section A.3.1.

  • level (numpy.ndarray or None, optional) – Test signal level without noise in dB SPL, given per octave band (125 Hz–8 kHz). Shape can be (7,) (7 octave bands: 125 Hz–8 kHz) to use the same values for all channels, or (rir.cshape, 7) for channel-specific values. If None is provided, auditory and ambient noise corrections are omitted. See [8], section A.3.2.

  • snr (numpy.ndarray or float, optional) – Signal-to-noise ratio in dB for each octave band (125 Hz–8 kHz). Shape can be (7,) (7 octave bands: 125 Hz–8 kHz) to use the same values for all channels, or (rir.cshape, 7) for channel-specific values. Default is np.inf (no ambient noise). See [8], section 3.

  • ambient_noise_correction (bool, optional) – Apply ambient noise correction according to [8], Annex A.2.3. Default is True. Only applied when level is not None and ambient_noise_correction is True.

Returns:

sti – Channel-wise Speech Transmission Index with shape rir.cshape.

Return type:

numpy.ndarray

Notes

pyfar uses octave-band filters of order 14 and the filter order influences the MTF. Higher filter orders produce steeper roll-off and a more ideal band separation, which affects the energy distribution within each octave band and thus the computed modulation depth. The influence on the broadband STI is negligible, since individual deviations in the MTF tend to average out in the weighted sum over octave bands and modulation frequencies.

References