International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 1474
ISSN 2229-5518
Ms. S.G. Dighe¹ Prof. S. S. Gundal ² tiya.10588@gmail.com gunjalsheetal@yahoo.com
Mobile No.: 9881003517
¹ ² Amrutvahini College of Engineering, Sangamner,
1. Introduction
Since 1994, when the development of AAC in MPEG- was initiated, five generations of AAC codec’s have evolved. The audio codec’s are designed for meeting every possible need in the fields of communications, broadcast and streaming. The AAC-ELD family consists of AAC-LD, AAC-ELD and AAC-ELD v2. The state-of-the-art MPEG- audio codec’s are designed for maximum speech and audio quality at very low coding delay and are therefore all excellent solutions for professional and consumer communication applications. This paper introduces the three members of the AAC-ELD family and gives a closer look at how to tackle coding delay.
1.1 The members of the AAC-ELD family:-
Fig 1.1: AAC-ELD family.
AAC-LD (Low Delay AAC), AAC-ELD (Enhanced Low Delay AAC) and AAC-ELDv2 are optimized for a low algorithmic delay, which is essential for natural real-time communication. In contrast to common speech codec’s, they extend the application area from clean voice to a broad variety of source material, including voice and singing, music and ambient sounds. Due to their technical superiority, the three members of the AAC-ELD family are represented all across the field of telecommunication, including Over-the-Top(OTT) services, video telephony, video conferencing and telepresence, as well as broadcast contribution services. The highly successful Apple Face Time is just one example of a video telephony application that relies on the quality of AAC- ELD.
The codec is also natively included in the operating systems iOS, Android and MacOS X. The AAC- ELD family delivers a new level of audio quality which is called Full-HD Voice. Unlike Plain Old Telephone Services (POTS), ISDN and mobile phone calls, Full-HD Voice offers an unsurpassed level of quality, resulting in calls that sound as clear as talking to someone in the same room, or listening to high-quality digital audio. This is possible as the codec’s support the full audio bandwidth of
20 kHz.
In addition to the millions of calls already being made today by using AAC-ELD, this technology is set to enable many new Full-HD Voice applications, including telepresence at home and mobile rich media telephony.The three family members can be regarded as a superset of each other, as they share the same coding core and each adds new coding tools [Figure 1.1]. Software of the AAC-ELD codec family can be expected to be fully backward compatible. The codec’s can handle mono, as well as stereo and multi-channel signals - all with latencies as low as 15 ms and at a wide range of bit-rates (down to 24 kbit/s) and sampling rates.
The audio quality and operating point of the AAC- ELD family members is described in Figure 1.2 for stereo audio. While AAC-LD is a very good choice for bit-rates above 96 kbit/s, AAC-ELD improves the audio quality down to 48 kbit/s. Below this bit-rate,AAC-ELDv2 is the best choice to keep the audio quality high. For mono
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 1475
ISSN 2229-5518
applications,a similar relationship between AAC-ELD and AAC-LD at half bit-rate can be expected,whereas AAC- ELD v2 delivers identical audio quality to AAC-ELD.
Fig 1.2: AAC-ELD stereo operating points.
2. MPEG-4 AAC-LD:- Block diagram:-
Fig 2: AAC core encoder.
The core structure of AAC-LD is directly derived from AAC.The time domain input samples are transformed into a frequency domain representation by an MDCT (or Low Delay MDCT in case of AAC-ELD) filter bank. The
960 (or 1024) sample size of the MDCT analysis window utilizes a frequency resolution of 50 Hz and a time resolution of 10ms. These are chosen to efficiently exploit psychoacoustic effects of frequency and time domain masking.
As natural audio signals show diverse signal characteristics, specialized tools take care of
them:
- Temporal Noise Shaping allows the AAC-LD coder to exercise control over the temporal fine structure of the audio signal and improve the time resolution.
- Intensity Coupling and Mid/Side Stereo increase the coding gain for a stereo channel pair compared to encoding two mono channels separately.
- Perceptual Noise Substitution (PNS) uses a parametric representation of noise-like frequency bands for an effi- cient transmission.
The codec can operate in a fixed frame length mode where every packet is equal in size, or in a fixed bit- rate mode where the average bit-rate within a limited time frame is constant.
3. MPEG-4 AAC-ELD:-
Fig 3:AAC-ELD codec with dualrateSBR.
AAC-ELD is the most flexible codec to suit the different needs of all possible Full-HD Voice applications. To achieve this level of flexibility, AAC-ELD can be used at three different operation modes – all of them completely compatible with standard compliant decoders:
1.AAC-ELD core: This mode can be used in all applications where high bit-rates are available, for example
96 kbit/s and more for a stereo signal. A Low Delay MDCT filter bank replaces the MDCT filter bank used in AAC-LD. With this delay-optimized filter bank, AAC- ELD operates with a lower delay compared to AAC-LD.
2.AAC-ELD with SBR: This mode is the most flexible mode of AAC-ELD as it covers a very wide range of bit- rates (approximately 32 to 64 kbit/s per channel) and sampling rates, and is therefore the preferred mode for video telephony applications such as Apple FaceTime. The delay stays constant over a wide range of bit-rates enabling dynamically switching of bit-rates without causing delay variances. In MPEG documents, this mode is typically called “down sampled mode”. It incorporates a delay- optimized version of Spectral Bandwidth Replication (LD- SBR) technology to the AAC-ELD core. LD-SBR allows the reduction of overall bit rate while maintaining excellent audio quality. The lower part of the audio spectrum is coded with AAC-ELD core, while the LD-SBR tool encodes the upper part of the spectrum. LD-SBR is a parametric approach that exploits the harmonic structure of natural audio signals. It uses the relationship of the lower
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 1476
ISSN 2229-5518
and upper part of the spectrum for a guided recreation of the whole audio spectrum of the signal.
3.AAC-ELD with Dual Rate SBR: For applications demanding for even lower data rates, for example in live broadcast contribution, the “Dual Rate SBR” mode can be used. It is the most bit-rate efficient mode enabling bit- rates down to 24 kbit/s per channel at an increased delay compared to the other two modes. In this mode, again the LD-SBR tool is added to AAC-ELD: But the AAC-ELD core is coded with half the sampling frequency of the overall signal, instead of coding at the full sampling rate. This results in the best possible audio quality at very low bit rates. The structure of an AAC-ELD codec with Dual Rate SBR is shown in Figure3.
Every AAC-ELD standard-compliant decoder can operate in any of the three modes, which allows the designer of the encoder side to freely choose the mode that best fits the application scenario. The audio quality of AAC-ELD has been confirmed in several independent listening tests.
Fig 3.5: Minimum bit-rate for excellent audio quality.
4. MPEG-4 AAC-ELD v2:-
Fig 4: AAC-ELD v2
To achieve stereo performance at bit-rates close to monophonic operation, a parametric stereo extension has been integrated into AAC-ELD v2. This parametric extension is based on a 2-channel version of Low Delay MPEG Surround (LD-MPS) that further reduces the bit- rate. Instead of transmitting two channels, the LD-MPS encoder extracts spatial parameters to enable reconstruction of the stereo signal at the decoder side; the remaining mono down mix is AAC-ELD encoded. The LD-MPS data is transmitted together with the SBR data in the AAC-ELD bit stream. The AAC-ELD decoder reconstructs the mono signal and the LD-MPS decoder recreates the stereo image. Typically, the bit-rate overhead for the stereo parameters is around 3 kbit/s at 48 kHz. This allows AAC-ELD v2 to code stereo signals at bit-rates significantly lower than with discrete stereo coding.
5. Tackling the Delay Issue:-
In a face-to-face conversation, delays in response can be interpreted in a variety of ways including hesitation, requiring time to think, or not wanting to give an answer. However, if the other party actually responds immediately, but a delay is introduced by technical shortcomings, misinterpretations can happen very quickly and the conversation can become awkward and frustrating. Therefore, it is very important to keep these delays, also called latencies, to a minimum of 150 to 200 ms end to end delay.
The end-to-end delay of a VoIP call is aggregated by several processing steps and components, such as echo cancellation, noise suppression, automatic gain control, routers, jitter buffer and speech/audio coding. As it is very important to maintain a low total latency, it becomes crucial that every component uses this resource responsibly. AAC-ELD is ideally suited in this regard as it contributes only 15 to 32 ms, depending on what bit and sampling rate are used.
1.Delay of AAC-LD:
The only sources of AAC-LD algorithmic delay for an IP based transmission are the overlapped of the MDCT filter bank, which generates a delay of 480 samples and the framing (audio input buffering), which adds another 480 samples. This corresponds to a minimum algorithmic delay of 20 ms at a sampling rate of 48 kHz
.
2. Delay of AAC-ELD:
In AAC-ELD core mode, the overlap-add delay of the filter bank is cut in half to 240samples resulting in a very low delay of 15ms. In the AAC-ELD with SBR mode, the SBR tool adds only a small delay of 64 (or 32) samples which leads to a very low delay of 15.7ms. Finally, the
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 1477
ISSN 2229-5518
dual rate mode achieves the best coding efficiency and ends up with a delay of still only 31.3ms.
3. Delay of AAC-ELD v2:
With AAC-ELD v2, the Low Delay MPEG Surround tool is incorporated in a way that it only causes a small filter bank delay on the decoder end. If the core coder operates in the AAC-ELD core mode, the additional delay is 5.3 ms (sampling rate 48 kHz). In case the core codec operates in a mode with LD-SBR, the additional delay can be reduced to 4ms. This results in a typical algorithmic delay of 35ms.
Fig 5: Delay of MPEG audio codec’s.
6. APPLICATION:-
1.Video-Telephony:AppleFacetime.
2.Videoconferencing/Telepresence:Cisco,Tandberg, Polycom.
3. Operatingsystems: iOS, Android, MacOS.
4.Broadcastcontribution:Telos.
5. Standards: TIP, ETSI/Dect, OIPF, N/ACIP.
Since, we had studied the review methods of MPEG-4
AAC-ELD using different techniques. Already being
widely used for professional and consumer communication applications, the AAC Low Delay codec’s are state-of-the- art MPEG-4 audio codec’s for maximum speech and audio quality at very low coding delay. By supporting the full audio bandwidth of 20 kHz, they are able to deliver Full- HD Voice audio quality to IP-communication applications and devices. These codec’s can be regarded as a superset of each other, as they share the same coding core and each adds new coding tools. Software of the AAC-ELD codec family can be expected to be fully backward compatible.
Acknowledgements
The goal of this paper is to design different methods of MPEG-4 AAC-ELD using different techniques “Review methods of MPEG-4 AAC-ELD”The function has been realized successfully.
I want to give my whole sincere to my supervisor and grateful appreciation to Prof. S. S. Gundal, as my supervisor of dissertation work; she tried her best to help me. Without her help and guidance I cannot bring the theories into practice.
On the other hand, I want to thank all my family members and friends for their always support and spiritual motivation.
Thank you very much!
REFERENCES:-
[1] A.M. KONDOZ, “DIGITAL SPEECH: CODING FOR LOW BIT RATE COMMUNICATION SYSTEMS”, 2ND ED., WILEY, 2004.
[2]A.SPANIAS,“SPEECHCODING:ATUTORIALREVIEW”, PROC. IEEE, VOL. 82, PP. 1541-1582, OCT. 1994.
[3] T. PAINTER AND A. SPANIAS, “PERCEPTUAL CODING OF DIGITAL AUDIO”, PROC. IEEE, VOL. 88, PP. 451-515, APR. 2000.
[4] K. R. RAO AND P. YIP, “DISCRETE COSINE TRANSFORM:ALGORITHMS,ADVANTAGES,APPLICATIO NS”, NEW YORK, ACADEMIC PRESS, 1990.
[5]H.MALVAR,“SIGNALPROCESSINGWITH LAPPEDTRANSFORMS”, ARTECH HOUSE, BOSTON, 1992.
[6] M. SCHNELL ET AL., “ENHANCED MPEG-4 LOW DELAY AAC – LOW BITRATE HIGH QUALITY COMMUNICATION”, 122ND CONVENTION OF AES, VIENNA, AUSTRIA, MAY 2007.
[7] ISO/IEC 14496-3:2005/FPDAM9, “ENHANCED LOW DELAY AAC”, APR. 2007.
[8] G.D.T. SCHULLER AND T. KARP, “MODULATED FILTERBANKS WITH ARBITRARY SYSTEM DELAY: EFFICIENT IMPLEMENTATIONS AND THE TIME VARYING CASE”, IEEE TRANS. SIGNAL PROCESSING, VOL. 48, NO. 3, PP. 737-748, MARCH 2000.
[9] ISO/IEC 14496-3: SUBPART 4: “GENERAL AUDIO CODING (GA) - AAC, TWINVQ, BSAC”.
10] P. DUHAMEL; Y. MAHIEUX AND J.P. PETIT, "A FAST ALGORITHM FOR THE IMPLEMENTATION OF FILTER BANKS BASED ON `TIME DOMAIN ALIASING CANCELLATION'," IN PROC. ICASSP, PP. 2209-2212 VOL.
3, 14-17 APR 1991.
[11] M.-H.CHENGANDY.-H. HSU, "FAST IMDCT AND MDCT ALGORITHMS A MATRIX APPROACH," IEEE TRANS. SIGNAL PROCESSING, VOL. 51, NO. 1,PP. 221-229, JAN. 2003.
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 1478
ISSN 2229-5518
[12]C.W.KOK,"FAST ALGORITHM FOR COMPUTING DISCRETE COSINE TRANSFORM," IEEE TRANS. SIGNAL PROCESSING, VOL. 45, NO. 3, PP. 757-760, MAR 1997.
[13] R.K.CHIVUKULAAND Y.A. REZNIK, “EFFICIENT IMPLEMENTATION OF A CLASS OF MDCT/IMDCT FILTERBANKS FOR SPEECH AND AUDIO CODING APPLICATIONS”, ACCEPTED FOR ICASSP 2008.
[14] M.T. HEIDEMAN, “COMPUTATION OF AN ODD- LENGTH DCT FROM A REAL-VALUED DFT OF THE SAME LENGTH”,IEEE TRANS. SIGNAL PROCESSING, VOL. 40, NO. 1, PP. 54-61, JAN 1992.
[15]S.WINOGRAD,“ONCOMPUTINGTHEDISCRETEFOURIE RTRANSFORM”,MATHEMATICS OF COMPUTATION, VOL. 32, NO. 141, PP. 175-199, JAN 1978.
[16]H.F.SILVERMAN,“ANINTRODUCTIONTOPROGRAMMI NG THEWINOGRAD FOURIER TRANSFORM ALGORITHM (WFTA)”,IEEE TRANS. ASSP, VOL. 25, NO. 2, PP. 152-165, APRIL 1977.
[17]C.S.BURRUSANDT.W.PARKS“DFT/FFTANDCONVOLU TIONALGORITHMS – THEORY AND IMPLEMENTATION”, WILEY, NY, 1985.
[18]H.V.SORENSENETAL.“REAL-VALUEDFAST FOURIER TRANSFORM ALGORITHMS”, IEEE TRANS. ASSP, VOL.
35, NO. 6, PP. 849-863, JUNE 1987.
[19] M. SCHNELL ET AL., “LOW DELAY FILTERBANKS FOR ENHANCED LOW DELAY AUDIO CODING”, IEEE WORKSHOP ON APPL. SIGNAL PROC. TO AUDIO AND ACOUSTICS, PP. 235-238, OCT. 2007.
[20] M. DIETZ ET AL., “SPECTRAL BAND REPLICATION, A NOVEL APPROACH IN AUDIO CODING”, 12TH AES CONVENTION, MUNICH, GERMANY, APR. 2002.
IJSER © 2013 http://www.ijser.org