Comparisons of the New to the previous edition of the Book
Minor and major changes as well as corrections have been made throughout the new
edition of the book. Here are the major changes:
1. Chapter 5 on the Method of Stochastic Gradient Descent is new.
2. In Chapter 6 (old Chapter 5) on the Least-Mean-Square (LMS) algorithm, major
changes have been made to the statistical learning theory of LMS in light of the
Langevin equation and the related Brownian motion.
3. Chapter 11 on Robustness is new.
4. The second half of Chapter 13 on Adaptation in Nonstationary Environments
is completely new, being devoted to the Incremental-Delta-Bar-Delta (IDBD)
Algorithm and the Autostep Method.
5. Appendices B and F on the Wirtinger Calculus and the Langevin Equation, respectively, are new.
6. The Bibliography is new.
7. The chapters on Adaptive IIR and Complex Neural Networks in the old edition
have been deleted.
introductory remarks on the New edition
The subject of adaptive filters constitutes an important part of statistical signal processing. Whenever there is a requirement to process signals that result from operation in an
environment of unknown statistics or one that is inherently nonstationary, the use of
an adaptive filter offers a highly attractive solution to the problem as it provides a significant improvement in performance over the use of a fixed filter designed by conventional methods. Furthermore, the use of adaptive filters provides new signal-processing
capabilities that would not be possible otherwise. We thus find that adaptive filters have
been successfully applied in such diverse fields as communications, control, radar, sonar,
seismology, and biomedical engineering, among others.
10
prefacePreface 11
Aims of the Book
The primary aim of this book is to develop the mathematical theory of various
realizations of linear adaptive filters. Adaptation is accomplished by adjusting the free
parameters (coefficients) of a filter in accordance with the input data, which, in reality,
makes the adaptive filter nonlinear. When we speak of an adaptive filter being “linear,”
we mean the following:
The input-output map of the filter obeys the principle of superposition whenever,
at any particular instant of time, the filter’s parameters are all fixed.
There is no unique solution to the linear adaptive filtering problem. Rather, we have
a “kit of tools” represented by a variety of recursive algorithms, each of which offers
desirable features of its own. This book provides such a kit.
In terms of background, it is assumed that the reader has taken introductory undergraduate courses on probability theory and digital signal processing; undergraduate
courses on communication and control systems would also be an advantage.
organization of the Book
The book begins with an introductory chapter, where the operations and different forms
of adaptive filters are discussed in general terms. The chapter ends with historical notes,
which are included to provide a source of motivation for the interested reader to plough
through the rich history of the subject.
The main chapters of the book, 17 in number, are organized as follows:
1. Stochastic processes and models, which are covered in Chapter 1. This chapter
emphasizes partial characterization (i.e., second-order statistical description) of
stationary stochastic processes. As such, it is basic to much of what is presented in
the rest of the book.
2. Wiener filter theory and its application to linear prediction, which are discussed in
Chapters 2 and 3. The Wiener filter, presented in Chapter 2, defines the optimum
linear filter for a stationary environment and therefore provides a framework for
the study of linear adaptive filters. Linear prediction theory, encompassing both
of its forward and backward forms and variants thereof, is discussed in Chapter 3;
the chapter finishes with the application of linear prediction to speech coding.
3. Gradient-descent methods, which are covered in Chapters 4 and 5. Chapter 4 presents the fundamentals of an old optimization technique known as the method
of steepest descent, which is of a deterministic kind; this method provides the
framework for an iterative evaluation of the Wiener filter. In direct contrast,
the follow-up chapter, Chapter 5, presents the fundamentals of the method of
stochastic gradient descent, which is well-suited for dealing with nonstationary
matters; the applicability of this second method is illustrated by deriving the
least-mean-square (LMS) and gradient adaptive lattice (GAL) algorithms.4. Family of LMS algorithms, which occupies Chapters 6, 7, and 8:
• Chapter 6 begins with a discussion of different applications of the LMS algorithm, followed by a detailed account of the small step-size statistical theory.
This new theory, rooted in the Langevin equation of nonequilibrium thermodynamics, provides a fairly accurate assessment of the transient behavior of
the LMS algorithm; computer simulations are presented to justify the practical validity of the theory.
• Chapters 7 and 8 expand on the traditional LMS algorithm by presenting
detailed treatments of the normalized LMS algorithm, affine projection
adaptive filtering algorithms, and frequency-domain and subband adaptive
LMS filtering algorithms. The affine projection algorithm may be viewed as
an intermediate between the LMS and recursive least-squares (RLS) algorithms; the latter algorithm is discussed next.
5. Method of least squares and the RLS algorithm, which occupy Chapters 9 and
10. Chapter 9 discusses the method of least squares, which may be viewed as the
deterministic counterpart of the Wiener filter rooted in stochastic processes. In the
method of least squares, the input data are processed on a block-by-block basis;
block methods, disregarded in the past because of their numerical complexity, are
becoming increasingly attractive, thanks to continuing improvements in computer
technology. Chapter 10 builds on the method of least squares to desire the RLS
algorithm, followed by a detailed statistical theory of its transient behavior.
6. Fundamental issues, addressing robustness in Chapter 11, finite-precision effects in
Chapter 12, and adaptation in nonstationary environments in Chapter 13:
• Chapter 11 begins by introducing the H∞-theory, which provides the mathematical basis of robustness. With this theory at hand, it is shown that the
LMS algorithm is indeed robust in the H∞-sense provided the chosen stepsize parameter is small, whereas the RLS algorithm is less robust, when both
algorithms operate in a nonstationary environment in the face of internal
as well as external disturbances. This chapter also discusses the trade-off
between deterministic robustness and statistical efficiency.
• The theory of linear adaptive filtering algorithms presented in Chapters 5
through 10, is based on continuous mathematics (i.e., infinite precision). When,
however, any adaptive filtering algorithm is implemented in digital form,
effects due to the use of finite-precision arithmetic arise. Chapter 12 discusses
these effects in the digital implementation of LMS and RLS algorithms.
• Chapter 13 expands on the theory of LMS and RLS algorithms by evaluating and comparing their performances when they operate in a nonstationary environment, assuming a Markov model. The second part of this
chapter is devoted to two new algorithms: first, the incremental delta-bardelta (IDBD) algorithm, which expands on the traditional LMS algorithm
by vectorizing the step-size parameter, and second, the Autostep method,
which builds on the IDBD algorithm to experimentally formulate an adaptive procedure that bypasses the need for manual tuning of the step-size
parameter.
12 Preface7. Kalman filter theory and related adaptive filtering algorithms, which occupy
Chapters 14, 15, and 16:
• In reality, the RLS algorithm is a special case of the celebrated Kalman
filter, which is covered in Chapter 14. A distinct feature of the Kalman filter
is its emphasis on the notion of a state. As mentioned, it turns out that the
RLS algorithm is a special case of the Kalman filter; moreover, when the
environment is stationary, it also includes the Wiener filter as special case. It
is therefore important that we have a good understanding of Kalman filter
theory, especially given that covariance filtering and information filtering
algorithms are variants of the Kalman filter.
• Chapter 15 builds on the covariance and information filtering algorithms to
derive their respective square-root versions. To be more specific, the ideas of
prearray and postarray are introduced, which facilitate the formulation of a
new class of adaptive filtering algorithms structured around systolic arrays
whose implementations involve the use of Givens rotations.
• Chapter 16 is devoted to yet another new class of order-recursive leastsquares lattice (LSL) filtering algorithms, which again build on the covariance and information algorithmic variants of the Kalman filter. For their
implementation, they exploit a numerically robust method known as QRdecomposition. Another attractive feature of the order-recursive LSL filtering algorithms is the fact that their computational complexity follows a
linear law. However, all the nice features of these algorithms are attained
at the expense of a highly elaborate framework in mathematical as well as
coding terms.
8. Unsupervised (self-organized) adaptation, which is featured in the last chapter of
the book—namely, Chapter 17 on blind deconvolution. The term “blind” is used
herein to express the fact that the adaptive filtering procedure is performed without the assistance of a desired response. This hard task is achieved by exploiting
the use of a model that appeals to the following notions:
• Subspace decomposition, covered in the first part of the chapter, provides a clever but mathematically demanding approach for solving the blind equalization
problem. To address the solution, use is made of cyclostationarity—an inherent
characteristic of communication systems—for finding the second-order statistics
of the channel input so as to equalize the channel in an unsupervised manner.
• High-order statistics, covered in the second part of the chapter, can be of an
explicit or implicit kind. It is the latter approach that this part of the chapter
addresses in deriving a class of blind equalization algorithms, collectively
called Bussgang algorithms. This second part of this chapter also includes
a new blind equalization algorithm based on an information-theoretic
approach that is rooted in the maximum entropy method.
The main part of the book concludes with an Epilogue that has two parts:
• The first part looks back on the material covered in previous chapters, with some
final summarizing remarks on robustness, efficiency, and complexity, and how the
Preface 13LMS and RLS algorithms feature in the context of these three fundamentally
important issues of engineering.
• The second part of the Epilogue looks forward by presenting a new class of nonlinear adaptive filtering algorithms based on the use of kernels (playing the role
of a hidden layer of computational units). These kernels are rooted in the reproducing kernel Hilbert space (RKHS), and the motivation here is to build on
material that is well developed in the machine literature. In particular, attention
is focused on kernel LMS filtering, in which the traditional LMS algorithm plays a
key role; the attributes and limitations of this relatively new way of thinking about
adaptive filtering are briefly discussed.
The book also includes appendices on the following topics:
• Complex variable theory
• Wirtinger Calculus
• Method of Lagrange multipliers
• Estimation theory
• Eigenanalysis
• The Langevin equation
• Rotations and reflections
• Complex Wishart distribution
In different parts of the book, use is made of the fundamental ideas presented in these
appendices.
Ancillary Material
• A Glossary is included, consisting of a list of definitions, notations and conventions, a list of abbreviations, and a list of principal symbols used in the book.
• All publications referred to in the text are compiled in the Bibliography. Each
reference is identified in the text by the name(s) of the author(s) and the year of
publication. A Suggested Readings section is also included with many other references that have been added for further reading.
examples, Computer experiments, and problems
Many examples are included in different chapters of the book to illustrate concepts and
theories under discussion.
The book also includes many computer experiments that have been developed to
illustrate the underlying theory and applications of the LMS and RLS algorithms. These
experiments help the reader to compare the performances of different members of these
two families of linear adaptive filtering algorithms.
Each chapter of the book, except for the introductory chapter, ends with problems
that are designed to do two things:
14 Preface• Help the reader to develop a deeper understanding of the material covered in the
chapter.
• Challenge the reader to extend some aspects of the theory discussed in the
chapter.
Solutions Manual
The book has a companion solutions manual that presents detailed solutions to all the
problems at the end of Chapters 1 through 17 of the book. A copy of the manual can
be obtained by instructors who have adopted the book for classroom use by writing
directly to the publisher.
The MATLAB codes for all the computer experiments can be accessed by going
to the web site http://www.pearsoninternationaleditions.com/haykin/.
two Noteworthy Symbols
Typically, the square-root of minus one is denoted by the italic symbol j, and the differential operator (used in differentiation as well as integration) is denoted by the italic
symbol d. In reality, however, both of these terms are operators, each in its own way; it
is therefore incorrect to use italic symbols for their notations. Furthermore, the italic
symbol j and the italic symbol d are also frequently used as indices to represent other
matters, thereby raising the potential for confusion. Accordingly, throughout the book,
the roman symbol j and the roman symbol d are used to denote the square root of minus
one and the differential operator, respectively.
Use of the Book
The book is written at a level suitable for use in graduate courses on adaptive signal
processing. In this context, it is noteworthy that the organization of the material covered
in the book offers a great deal of flexibility in the selection of a suitable list of topics for
such a graduate course.
It is hoped that the book will also be useful to researchers and engineers in industry as well as government establishments, working on problems relating to the theory
and applications of adaptive filter