Topic > USB Audio Basics

IndexUSB BasicsUSB AudioWhat is a Friendly Second?Multiple Clock SourcesCompliance and Native SupportSummaryUSB, the Universal Serial Bus has been around for decades and is a widely used standard around the world of personal computers. Memory sticks, external drives, mice and webcams are all interfaced via USB. In this article we'll look at USB audio: a digital audio standard used in PCs, smartphones, and tablets to interface with audio peripherals such as speakers, microphones, or mixers. In this article, we set out to show how USB audio works, what to pay attention to, and how to use USB audio for high-fidelity multi-channel input and output. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay USB BasicsUSB is a protocol in which the PC, the USB host, initiates a transfer and the device (for example, a USB speaker) responds. Each transfer is targeted to a specific device and a specific endpoint on the device. IN transfers send data to the PC. When the host initiates an IN transfer, the device must respond with data for the host. OUT transfers send data to the device. When the host performs an OUT transfer it sends a data packet for the device to acquire. In the world of USB audio, IN and OUT transfers can be used to transport audio samples: an OUT transfer is used to send audio data from a PC to a speaker, while an IN transfer is used to send audio data from a microphone to a PC . There are four types of IN and OUT transfers in USB: bulk, isochronous, interrupt, and control transfers. A bulk transfer is used to reliably transfer data between host and device. All USB transfers carry a CRC (checksum) indicating whether an error occurred. In a bulk transfer, the data recipient must verify the CRC. If the CRC is correct, the transfer is confirmed and it is assumed that the data was transferred without errors. If the CRC is incorrect, the transfer will not be recognized and will be retried. If the device is not ready to accept the data, it can send a negative acknowledgment, NAK, which will cause the host to retry the transfer. Bulk transfers are not considered time-critical and are scheduled based on the time-critical transfers discussed below. Isochronous transfers are used to transfer data in real time between host and device. When an isochronous endpoint is configured by the host, the host allocates a specific amount of bandwidth to the isochronous endpoint and regularly performs an IN or OUT transfer on that endpoint. For example, the host can send 1 KByte of data to the device every 125 µs. Because a fixed and limited amount of bandwidth has been allocated, there is no time to resend data if something goes wrong. The data has a CRC as usual, but if the receiving side detects an error there is no resend mechanism. Interrupt transfers are used by the host to regularly query the device to find out if anything useful has happened. For example, a host can query an audio device to see if the MUTE button has been pressed. The name Interrupt transfer is a bit confusing, since they don't interrupt anything. However, regular data polling provides the same type of functionality that a host-interrupt provides. Control transfers are very similar to mass transfers. Control transfers are recognized, can be NAKed, and are provided in a non-real-time manner. Control transfers are used for operations outside the normal data flow, such as executingqueries about device capabilities or endpoint status. An explanation of how the device's capabilities are described is beyond the scope of this article, and we simply state that there are predefined classes such as "USB Audio Class" or "USB Mass Storage Class" that allow for interoperability between platforms. All transfers are done in USB Frames. High-speed USB frames last 125 µs (full-speed USB frames are 1 ms) and are marked by the host sending a Start-Of-Frame (SOF) message. Isochronous and interrupted transfers are transmitted at most once per frame. USB AudioUSB audio uses isochronous, interruptive, and control transfers. All audio data is transferred via isochronous transfers; interrupt transfers are used to convey information regarding the availability of audio clocks; Control transfers are used to set the volume, request sample rates, etc. The data requirements of a USB audio system depend on the number of channels, the number of bits to represent each sample, and the sampling rate. The typical number of channels is 2 (stereo), 6 (5.1) or much higher for studio and DJ use. Typically the sample size is 24 bits, although 16 bits are available for legacy audio and 32 bits for high quality audio. Typical sampling rates are 44.1, 48, 96, and 192kHz. The latter is used for high-quality audio. Suppose you design a stereo audio speaker with a sampling rate of 96 kHz and 24-bit samples. To simplify data marshalling across host and device, 24-bit values ​​are typically padded with a zero byte, so the total data throughput is 96,000 x 2 channels x 4 bytes = 768,000 bytes per second. Isochronous endpoints operate at a rate of one transfer every 125 µs; or 8,000 transfers per second. Dividing the required byte rate by the frame rate gives you the number of bytes for each isochronous transfer: 768,000/8,000 = 96 bytes per transfer. When using CD speeds, such as 44,100 Hz, the transfer rate is 44.1 transfers per second. In USB audio each transfer always carries an integer number of samples; alternating transfers carry 48 and 40 bytes (6 and 5 stereo samples), so the average rate is 44.1 bytes per transfer. A single isochronous transfer can carry 1024 bytes and at most 256 samples (24/32 bit). This means that a single isochronous endpoint can transfer 42 channels at 48kHz or 10 channels at 192kHz (assuming high-speed USB is used - full-speed USB cannot carry more than a single stereo IN and OUT pair at 48kHz ). When transmitting digital audio, latency is introduced. In the case of high-speed USB this latency is 250 µs. A data packet is transferred once in each 125 µs window, but since it can be sent at any time in this window a buffer of 250 µs is required. In addition to this 250 µs delay, there may be additional delay in the operating system driver and CODEC. Keep in mind that Full Speed ​​USB has a much higher inherent latency of 2 ms, as data is only sent once in each 1 ms window. What is a second between friends? The big problem in digital audio is agreeing on a common notion of time. Above we defined that USB frames will be transferred 8,000 times per second and set the speakers to play a sample 96,000 times per second. This will only work if the speaker and host agree on the length of one second. USB audio offers three modes that ensure the host and speaker agree on timing: In mode.