First-Hand:Pitch Determination
Submitted by A. Michael Noll
September 25, 2025
Starting in mid 1961, I was employed as a Member of Technical Staff (MTS) at Bell Telephone Laboratories, Inc. (Bell Labs) in Murray Hill, New Jersey. My office was on the 4th floor of Building 2. I was determining the human reactions to telephone parameters, like sidetone and peak clipping. I also was in a Bell Labs program to obtain a Masters degree, and, as part of that program, I was assigned during the summer of 1962 to the Acoustics Research department on the 5th floor of Building 2 – just up one flight of stairs.
A challenge back then was determining from seismic readings whether an incident was a natural earthquake or an underground nuclear test. Research on how to do this was being conducted at the Bell Labs facility in Whippany, New Jersey, when Dr. Bruce P. Bogert noticed ripples in the spectrum of the seismic signal. Dr. John W. Tukey suggested taking the logarithm of the spectrum to make the ripples more sinusoidal. They named this spectrum of the log spectrum the “cepstrum.”[1] They applied the technique to seismic signals to determine depth, but it was not effective.
An internal Bell Labs memorandum described their cepstrum technique. Dr. Manfred R. Schroder, in the speech research area of Bell Labs, saw the memorandum and realized the method might work in determining the pitch of human speech, if applied to short segments of the speech signal (what he called short-time spectrum analysis).
In my 1962 summer assignment to the Acoustic Research area, I reported to Schroeder who was then a department head. He described his idea to use cepstrum analysis of the speech signal to determine pitch, including how to simulate an analysis using sine waves to analysis the signal. He told me to program it using the block diagram compiler (BLODI). He then went on vacation. I was overwhelmed, but colleagues helped me understand and unravel his ideas. I programmed it all – and it worked. I knew little of the past challenges and failures in determining pitch – but the cepstrum solved all these past challenges.
I wrote a paper about the technique, and it was published in the Journal of the Acoustical Society of America.[2] Even though my summer assignment was over and I was back on the 4th floor, I continued to work on cepstrum pitch determination. I wrote a program to pick the peak in the cepstrum, and the method was patented. It was too late to patent the cepstrum itself though. I used the pitch data in speech synthesizers and the synthesized speech sounded quite natural. Ultimately, I was transferred full-time to the Acoustics Research area on the 5th floor as a MTS, even though I did not have a doctorate. I even had my own private office.
I abandoned the awkward BLODI block diagram analysis and wrote a FORTRAN program to do the analysis using numerical techniques.[3] The computer time was very large – an hour of the IBM 7094 to analyze a few seconds of speech. Our Executive Director Dr. John R. Pierce remarked that “a complicated system that works is far better then a simpler one that does not.”
I saw patterns in the sine waves used for the analyses, and almost stumbled upon what would be the fast Fourier technique (FFT). I went on to discover other pitch determination methods. One method was baaed on graphical observations of patterns in the spectrum of the speech signal. Bell Labs mathematician Dr. David Slepian showed that my harmonic product spectrum was a maximum likelihood estimate of the pitch. I wrote a very mathematical paper about all these pitch determination methods, and presented it at a symposium at the Polytechnic Institute of Brooklyn (later published in its proceedings).[4]
Systems to analyze and synthesize speed (called vocoders) were very important to creating secure speech communication, in addition to saving bandwidth in general. Accurate pitch determination was essential to creating natural sounding synthesized speech. Cepstrum pitch determination was used for many years and became something of a “gold standard.”
I was never very good at equations and saw things graphically. Schroeder was a physicist and used equations to understand things. I realize now that we were a good team, although his brilliance and knowledge always amazed me.
I used the cepstrum pitch method to analyze the speech from the three Apollo astronauts to attempt to determine who was speaking at the end of the disaster. This then motivated a research project to determine the effects of peak clipping on the intelligibility of shouted speech, including the effects on pitch (conducted with D. Jack MacLean).[5] I also used cepstrum pitch analysis to compare speech communication face-to-face versus two-way interactive video, and concluded that the video was a more tense medium.[6]
In the early 1980’s, I taught two doctoral level courses about speech analysis for the Speech Science and Technology program (founded and directed by Prof. June E. Shoup) at the University of Southern California.
Once I had finished my explorations of pitch determination at Bell Labs, my curiosity was satisfied. I did not make a career of exploring pitch determination in greater depth for an entire career. I went on to other things at Bell Labs, such as interactive 3D graphics, and in mid 1971 left Bell Labs to go to Washington to work at the office of the President’s Science Advisor. My career made many interesting twists and turns.
References
- ↑ B. P. Bogert, M. J. R. Healy, and J. W. Tukey, “The Quefrency Analysis of Time Series for Echoes.” Proceedings of the Symposium on Time Series Analysis (Chapter 15), John Wiley & Sons, 1963, pp. 209-243.
- ↑ A. M. Noll, “Short-Time Spectrum and Cepstrum Techniques for Vocal-Pitch Detention,“ J. Acoustical Society of America, Vol. 36 (1964), pp. 296-302.
- ↑ A. Michael Noll, "Cepstrum Pitch Determination," Journal of the Acoustical Society of America, Vol. 41, No. 2, (February 1967), pp. 293-309.
- ↑ A. M. Noll, ”Pitch Determination of Human Speech by the Harmonic Product Spectrum, the Harmonic Sum Spectrum, and a Maximum Likelihood Estimate,” Proceedings of the Symposium on Computer Processing in Communications, Vol. XIX, Polytechnic Press: Brooklyn, New York, (1970), pp. 779-797.
- ↑ A. Michael Noll, (with D.J. MacLean), “The Intelligibility of Shouted Speech,” Proceedings of the Symposium on the Aeromedical Aspects of Radio Communication and Flight Safety, AGARD/NATO Advisory Report 19, pp. 10-1 to 10-13, December 1969 (London).
- ↑ A. Michael Noll, "The Effects of Communications Medium on the Fundamental Frequency of Speech," Communications Quarterly, Vol. 26, No. 2 (Spring 1978), pp. 51-56.