Topic > Sound quantization analysis

IndexSound quantization analysis Pulse code modulation of sound Discrete Fourier transform algorithmWindow functionsAcoustic fingerprintPure tones do not exist naturally, but every sound in the world is the sum of multiple pure tones different amplitudes. A musical song is played by multiple instruments and singers. All these instruments produce a sine wave combination at multiple frequencies and overall it is an even bigger sine wave combination. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay A spectrogram is a very detailed and accurate image of your audio, displayed in 2D or 3D. Audio is displayed on a graph based on time and frequency, with brightness or height (3D) indicating amplitude. While a waveform shows how the amplitude of the signal changes over time, the spectrogram shows this change for each frequency component in the signal. For example, you can see that the impact of droplets consistently forms large surface bubbles and the standard "bloop" noise on the figure. . 4. Color represents amplitude in dB. In this spectrogram some frequencies are more important than others, so we can build a fingerprint detection algorithm. Analog signals are continuous signals, which means that if you take one second of an analog signal, you can divide this second into parts that last a fraction of a second. In the digital world you can't afford to store an infinite amount of information. You must have a minimum unit, such as 1 millisecond. During this unit of time the sound cannot change so this unit must be short enough for the digital song to sound like the analog one and large enough to limit the space needed to store the music. The Nyquist sampling theorem provides a prescription for the nominal sampling interval required to avoid aliasing. It can be stated simply as follows: the sampling frequency should be at least twice the highest frequency contained in the signal. Or in mathematical terms: fs = 2 fc where fs is the sampling frequency (how often samples are taken per unit of time or space) and fc is the highest frequency contained in the signal. A theorem by Nyquist and Shannon states that if you want to digitize a signal from 0Hz to 20kHz you need at least 40,001 samples per second. The standard sampling rate for digital music in the music industry is 44.1 kHz, and each sample is assigned 16 bits. Some theorem definitions describe this process as a perfect recreation of the signal. The main idea is that a sinusoidal signal at frequency F needs at least 2 points per cycle to be identified. If the sampling rate is at least double the signal rate, you will end up with at least 2 points per cycle of the original signal. Sampling, the process of converting a signal into a digital sequence, is also called analog-to-digital conversion. Quantization is another conversion process, which is the accurate measurement of each sample. Analog-to-digital converters and digital-to-analog converters encode and decode these signals to record our voices, display images on the screen, or play audio clips through speakers. Because we can digitize media, we can manage, recreate, alter, produce and store text, images and sounds. The theorem, although it may be seen as simple, has changed the way our modern digital world works. We can use the media uniformly to our advantage inmultiple ways. The limitations we have can be resolved through filters and by adjusting our frequencies or sample rates. Although it does not have the same shape or the same amplitude, the frequency of the sampled signal remains the same. Analog-to-digital converters perform this type of function to create a series of digital values ​​from the given analog signal. The following figure represents an analog signal. To be converted into digital, this signal must be subjected to sampling and quantization. Sound quantization analysisQuantization is the process of mapping input values ​​from a large set (often a continuous set) to output values ​​in a smaller (countable) set. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some extent in almost all digital signal processing, as the process of representing a signal in digital form normally involves smoothing. Quantization also forms the core of essentially all lossy compression algorithms. Quantization makes the range of a signal discrete so that the quantized signal takes on only a set of discrete, usually finite, values. Unlike sampling, quantization is generally irreversible and results in loss of information. Therefore, it introduces distortion into the quantized signal that cannot be eliminated. One of the fundamental choices in quantization is the number of discrete quantization levels to use. The fundamental trade-off in this choice is the quality of the resulting signal versus the amount of data needed to represent each sample. Fig. 6 shows an analog signal and quantized versions for different numbers of quantization levels. With L levels, we need N = log2 L bits to represent the different levels or, conversely, with N bits we can represent L = 2N levels. Pulse Code Modulation SoundPulse code modulation (PCM) is a system used to translate analog signals into digital data. It is used by compact discs and most electronic devices. For example, when you listen to an mp3 file on your computer/phone/tablet, the mp3 is automatically transformed into a PCM signal and then sent to your headphones. A PCM stream is a stream of organized bits. It can be composed of multiple channels. For example, stereo music has 2 channels. In a stream, the signal amplitude is divided into samples. The number of samples per second corresponds to the sampling rate of the music. For example, music sampled at 44.1kHz will have 44100 samples per second. Each sample provides the (quantized) amplitude of the sound for the corresponding fraction of a second. There are several PCM formats but the most used in audio is the stereo PCM (linear) format 44.1kHz, 16 bit depth. This format has 44,100 samples for each second of music. Each sample occupies 4 bytes (Fig. 7): 2 bytes (16 bits) for the intensity (from -32.768 to 32.767) of the left speaker 2 bytes (16 bits) for the intensity (from -32.768 to 32.767) of the right speakerIn a 16-bit deep 44.1kHz PCM stereo format, you have 44100 samples like this for every second of music.Discrete Fourier Transform AlgorithmThe DFT (discrete Fourier transform) applies to discrete signals and provides a discrete spectrum ( the frequencies within the signal). The Discrete Fourier Transform (DFT) is a method for converting a sequence of N complex numbers x0, x1, … xN-1 into a new sequence of N complex numbers. In this formula:N is the size of the window: the number of samples that composed the signalX(n) represents the nth frequency binx(k) is a kth sample of the audio signal. DFT is useful in many applications,including simple spectral analysis of the signal. Knowing how a signal can be expressed as a combination of waves allows manipulation of that signal and comparison of different signals: digital files (jpg, mp3, etc.) can be reduced by eliminating the contributions of the least important waves in the combination. Different audio files can be compared by comparing the x(k) coefficients of the DFT. Radio waves can be filtered to avoid "noise" and listen to the important components of the signal. Other applications of DFT arise because it can be calculated very efficiently using the fast Fourier transform (FFT) algorithm. For example, DFT is used in state-of-the-art algorithms to multiply polynomials and large integers together; Instead of working directly with polynomial multiplication, it is faster to calculate the DFT of polynomial functions and convert the problem of multiplying polynomials into an analogous problem involving their DFTs. Window Functions In signal processing, a window function is a mathematical function that has a value of zero outside a chosen range. For example, a function that is constant within the range and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation. When another function or waveform/data sequence is multiplied by a window function, the product also has zero value outside the range - all that remains is the part where they overlap, the "view through the window." In typical applications, the window functions used are non-negative, smooth, "bell" curves. You can also use rectangle, triangle and other functions. A more general definition of window functions does not require that they be identically zero outside an interval, as long as the product of the window multiplied by its argument is square-integrable and, more specifically, that the function goes sufficiently quickly towards zero. The Fourier transform of the cos ?t function is zero, except at the frequency ±?. However, many other functions and waveforms do not have convenient closed-form transformations. Alternatively, one might be interested in their spectral content only during a certain period of time. In both cases, the Fourier transform (or a similar transform) can be applied on one or more finite intervals of the waveform. In general, the transform is applied to the product of the waveform and a window function. Any window (including rectangular) affects the spectral estimate calculated with this method. Windowing a simple waveform like cos ?t causes its Fourier transform to develop non-zero values ​​(commonly called spectral losses) at frequencies other than ?. The loss tends to be worst (maximum) near ? and at least at the frequencies furthest from ?. If the waveform under analysis includes two sinusoids of different frequencies, the loss may interfere with the ability to distinguish them spectrally. If their frequencies are different and one component is weaker, leakage from the stronger component can obscure the presence of the weaker one. But if the frequencies are similar, the losses can make them unsolvable even when the sinusoids have the same strength. The rectangular window has excellent resolution characteristics for sinusoids of comparable intensity, but is a poor choice for sinusoids of disparate amplitudes. This characteristic is sometimes described as low dynamic range. At the other end of the dynamic range are the windows with the poorest resolution and sensitivity, i.e. the ability to detect relatively weak sinusoids in the presence of additive random noise. This is because the noiseproduces a stronger response with high dynamic range windows than with high resolution windows. Therefore, high dynamic range windows are often justified in broadband applications, where the analyzed spectrum is expected to contain many different components of various amplitudes. Between the extremes there are moderate windows, such as Hamming and Hann. They are commonly used in narrowband applications, such as telephone channel spectrum. In summary, spectral analysis involves a trade-off between resolving comparable force components with similar frequencies and resolving disparate force components with different frequencies. This trade-off occurs when the window function is chosen. When the input waveform is sampled over time, rather than continuous, the analysis is usually performed by applying a window function and then a discrete Fourier transform (DFT). But DFT provides only a sparse sample of the discrete-time Fourier transform (DTFT) spectrum. Fig. 8 shows a portion of the DTFT for a rectangular window sine wave. The actual frequency of the sine wave is indicated as "0" on the horizontal axis. Everything else is a leak, exaggerated by the use of a logarithmic presentation. The frequency unit is "bins DFT"; that is, the integer values ​​on the frequency axis correspond to the frequencies sampled by the DFT. So the figure shows a case where the actual frequency of the sine wave coincides with a DFT sample and the maximum value of the spectrum is accurately measured from that sample. When the maximum value is missing by a certain amount (up to ½ bin), the measurement error is called scalloping loss (inspired by the shape of the peak). For a known frequency, such as a musical note or a sinusoidal test signal, frequency matching to a DFT bin can be arranged by choosing a sampling rate and window length that results in an integer number of cycles within signal window.In processing, operations are chosen to improve some aspect of a signal's quality by exploiting differences between the signal and corrupting influences. When the signal is a sinusoid corrupted by additive random noise, spectral analysis distributes the signal and noise components differently, often making it easier to detect the presence of the signal or measure certain characteristics, such as amplitude and frequency. In effect, the signal-to-noise ratio (SNR) is improved by distributing the noise evenly, concentrating most of the sine wave's energy around one frequency. Processing gain is a term often used to describe an improvement in SNR. The processing gain of spectral analysis depends on the function of the window, both its noise bandwidth and its potential scalloping loss. These effects are partially offset, because windows with the least scalloping naturally have the greatest loss. The frequencies of the sinusoids are chosen such that one encounters no scalloping and the other encounters maximum scalloping. Both sinusoids experience less SNR loss under the Hann window than under the Blackman-Harris window. In general (as mentioned above), this is a deterrent to using high dynamic range windows in low dynamic range applications. The human ear automatically and involuntarily performs a calculation that requires years of mathematical education to complete. The ear formulates a transformation by converting sound – the pressure waves that travel over time and through the atmosphere – into a spectrum, a description of sound as a series of volumes at distinct pitches. The brain then transforms this information into perceived sound. ASimilar conversion can be performed using mathematical methods on the same sound waves or on virtually any other fluctuating signal that varies with respect to time. The Fourier transform is the mathematical tool used to perform this conversion. Simply put, the Fourier transform converts time-domain waveform data to the frequency domain. The Fourier transform achieves this by breaking down the original time-based waveform into a series of sinusoidal terms, each with a unique amplitude, frequency, and phase. This process, in effect, converts a time-domain waveform that is difficult to describe mathematically into a more manageable series of sinusoidal functions that, when added together, exactly reproduce the original waveform. Plotting the amplitude of each sinusoidal term against its frequency creates a power spectrum, which is the response of the original waveform in the frequency domain. Fig. 10 illustrates this concept of conversion in the frequency domain. The Fourier transform has become a powerful analytical tool in several fields of science. In some cases, the Fourier transform can provide a means of solving complex equations describing dynamic responses to electricity, heat, or light. In other cases, it can identify regular contributions to a fluctuating signal, thus helping to make sense of observations in astronomy, medicine and chemistry. Perhaps because of its usefulness, the Fourier transform has been adapted for use on personal computers. Algorithms have been developed to link the personal computer and its ability to evaluate large quantities of numbers with the Fourier transform to provide a personal computer-based analysis solution for representing waveform data in the frequency domain. The Fast Fourier Transform (FFT) is a computationally efficient method for generating a Fourier transform. The main advantage of an FFT is speed, which is achieved by decreasing the number of calculations needed to analyze a waveform. A disadvantage associated with FFT is the limited range of waveform data that can be transformed and the need to apply a window weighting function to the waveform to compensate for spectral loss. The FFT is simply a faster implementation of the DFT. The FFT algorithm reduces an n-point Fourier transform to approximately (n/2) log2(n) complex multiplications. For example, calculated directly, a DFT on 1,024 (i.e. 210) data points would require n2 = 1,024 × 1,024 = 220 = 1,048,576 multiplications. The FFT algorithm reduces this value to approximately (n/2) log2 (n) = 512 × 10 = 5,120 multiplications, for an improvement of a factor of 200. But the increase in speed comes at the expense of versatility. The FFT function automatically places some restrictions on the time series to be evaluated to generate a meaningful and accurate frequency response. Because the FFT function by definition uses a base-2 logarithm, the interval or length of the time series to be evaluated must contain a total number of data points exactly equal to the number 2 to the nth power (for example, 512, 1024, 2048, etc.). Therefore, with an FFT you can only evaluate a fixed-length waveform containing 512 points, or 1024 points, or 2048 points, etc. For example, if your time series contains 1096 data points, you will only be able to evaluate 1024 of them at a time using an FFT since 1024 is the highest power 2 to the nth highest that is less than 1096. Because of this limitation to the power 2 all Yet another problem materializes. When a waveform is evaluated by an FFT, a section of the waveform is bounded to encompass 512 points, or 1024 points, etc. One of these.