Index: trunk/doc/stat_methods.pdf
===================================================================
Cannot display: file marked as a binary type.
svn:mimetype = application/pdf
Index: trunk/doc/stat_methods.tex
===================================================================
 trunk/doc/stat_methods.tex (revision 674)
+++ trunk/doc/stat_methods.tex (revision 675)
@@ 1,3796 +1,3801 @@
\documentclass[12pt,titlepage]{article}
\usepackage{fullpage}
\usepackage{amsmath}
\usepackage{makeidx}
\usepackage{epsfig}
\usepackage[hyperfootnotes=false]{hyperref}
\usepackage{longtable}
\usepackage[width=.85\textwidth]{caption}
\newcommand{\sub}[1]{\ensuremath{_{\mbox{\scriptsize \,#1}}}}
\newcommand{\supers}[1]{\ensuremath{^{\mbox{\scriptsize #1}}}}
\newcommand{\cname}[1]{\index{#1}\textsf{#1}}
\newcommand{\E}{\ensuremath{\mathbb{E}}}
\newcommand{\V}{\ensuremath{\mathbb{V}}}
\newcommand{\R}{\ensuremath{\mathbb{R}}}
\newcommand{\etal}{\emph{et al.}}
\newcommand{\hf}{\hat{f}_N}
\newcommand{\xf}{x\sub{fit}}
\newcommand{\scriptname}[1]{\texttt{#1}}
\newcommand{\at}{\tilde{a}\sub{fit}}
\newcommand{\bt}{\tilde{b}\sub{fit}}
\makeindex
\renewcommand\indexname{Functions and Classes}
\newenvironment{thinlist} {
\begin{list} {} {
\setlength{\topsep}{0.075cm}
\setlength{\parsep}{0.075cm}
\setlength{\itemsep}{0.075cm}
}
} {\end{list}}
\begin{document}
\title{Statistical Methods in the NPStat Package}
\author{I. Volobouev, {\it i.volobouev@ttu.edu}}
\date{Version: 5.1.0 \hspace{1.5cm} Date: \today}
\maketitle
\tableofcontents
\newpage
\section{Preliminaries}
\label{sec:preliminaries}
The NPStat package provides a C++ implementation of
several nonparametric
statistical modeling tools together with
various supporting code. Nonparametric modeling becomes very
useful when there is little prior information available to justify an
assumption that the data belongs to a certain parametric family of
distributions. Even though nonparametric techniques are usually
more computationally intensive than parametric ones,
judicious use of fast algorithms in combination with
increasing power of modern computers often results in very practical
solutions  at least until the ``curse of dimensionality'' starts
imposing its heavy toll.
It will be assumed that the reader of this note is reasonably
proficient in C++ programming, and that the meaning
of such words as ``class'', ``function'', ``template'',
``method'', ``container'', ``namespace'', {\it etc.}
is obvious from their context.
The latest version of the package code can be downloaded from
\href{http://npstat.hepforge.org/}{http://npstat.hepforge.org/}
Most classes and functions implemented in NPStat are placed in the
``npstat'' namespace. A~few functions and classes that
need Minuit2~\cite{ref:minuit} installation
in order to be meaningfully used belong
to the ``npsi'' namespace. Such functions and
classes can be found in the ``interfaces''
directory of the package.
When a C++ function, class, or template is mentioned in the text
for the first time, the header file which contains its declaration
is often mentioned as well. If the header file is not mentioned,
it is most likely ``npstat/stat/NNNN.hh'', where NNNN stands
for the actual name of the class or function.
An expression ``$I(Q)$'' is frequently employed in the text,
where $Q$ is some logical proposition.
$I(Q)$ is the proposition indicator function
defined as follows: $I(Q) = 1$ if $Q$ is true
and $I(Q) = 0$ if $Q$ is false. For example, a product of $I(0 \le x \le 1)$
with some other mathematical function is often used to
denote some probability density supported
on the $[0, 1]$ interval. Note that $I(\neg Q) = 1  I(Q)$, where
$\neg Q$ is the logical negation of $Q$.
\section{Data Representation}
\label{sec:datarep}
For data storage, the NPStat software relies on the object serialization
and I/O capabilities provided
by the C++ Geners package~\cite{ref:geners}.
Geners supports persistence of all standard library containers
(vectors, lists, maps, {\it etc.}) including those introduced
in the C++11 Standard~\cite{ref:cpp11} (arrays, tuples, unordered sets
and maps). In addition to data structures supported by Geners, NPStat
offers several other persistent containers useful in statistical
data analysis: multidimensional arrays, homogeneous $n$tuples, and
histograms.
Persistent multidimensional arrays are implemented with the \cname{ArrayND}
template (header file ``npstat/nm/ArrayND.hh'').
This is a~reasonably fullfeatured array class, with support for
subscripting (with and without bounds checking), vector space operations
(addition, subtraction,
multiplication by a scalar), certain tensor operations (index contraction,
outer product), slicing, iteration over subranges, linear and cubic
interpolation of array values
to fractional indices, {\it etc.} Arrays whose maximum
number of elements is known at the compile time can be efficiently
created on the stack.
Homogeneous $n$tuples are twodimensional tables in which the number of
columns is fixed while the number of rows can grow dynamically.
The stored object type is chosen when the $n$tuple is created
(it is a template parameter). For
homogeneous $n$tuples, storage type is the same for every column
which allows for more efficient optimization of memory allocation
in comparison with heterogeneous $n$tuples.\footnote{The Geners
serialization package supports
several types of persistent heterogeneous $n$tuples including
std::vector$<$std::tuple$<$...$> >$,
RowPacker variadic template
optimized for accessing large data samples rowbyrow,
and ColumnPacker variadic template
optimized for data access columnbycolumn.}
Two persistent template classes are provided for homogeneous $n$tuples:
\cname{InMemoryNtuple} for $n$tuples which can completely fit inside
the computer memory (header file ``npstat/stat/InMemoryNtuple.hh'')
and \cname{ArchivedNtuple} for $n$tuples
which can grow as large as the available disk space
(header file ``npstat/stat/ArchivedNtuple.hh''). Both of these
classes inherit from the abstract class \cname{AbsNtuple} which
defines all interfaces relevant for statistical analysis of the data
these $n$tuples contain. Thus \cname{InMemoryNtuple} and \cname{ArchivedNtuple}
are completely interchangeable for use in various data analysis algorithms.
Arbitrarydimensional\footnote{To be precise,
the number of dimensions for both histograms
and multidimensional arrays can not exceed
31 on 32bit systems and 63 on 64bit systems. However, your computer will
probably run out of memory long before this limitation
becomes relevant for your data analysis.} persistent
histograms are implemented with the \cname{HistoND}
template (header file ``npstat/stat/HistoND.hh''). Both uniform and nonuniform
placing of bin edges can be created using
axis classes \cname{HistoAxis} and \cname{NUHistoAxis}, respectively,
as \cname{HistoND} template parameters. The \cname{DualHistoAxis} class
can be employed for histograms in which only a fraction of axes
must have nonuniform bins.
Two binning schemes are supported
for data sample processing.
In the normal scheme, the bin count is incremented if the point coordinates
fall inside the given bin (the \cname{fill} method of the class).
In the centroidpreserving scheme,
all bins in the neighborhood of the sample point
are incremented in such a way that the center of mass of bin
increments coincides exactly with the point location
(the \cname{fillC} method)\footnote{For onedimensional histograms,
this is called ``linear binning''~\cite{ref:linearbinning}.
NPStat supports this technique for uniformly binned
histograms of arbitrary dimensionality.}.
The objects used as
histogram bin contents and the objects used as
weights added to the bins when
the histogram is filled are not required to have
anything in common; the only requirement is that a~meaningful
binary function (typically, operator+=) can be defined for them.
This level of abstraction allows, for example, for using vectors and tensors
as histogram bin contents or for implementing profile histograms with
the same \cname{HistoND} template. As the bin contents are implemented
with the \cname{ArrayND} class, all features of \cname{ArrayND}
(vector space operations, slicing, interpolation, {\it etc}) are available
for them as well. Overflows are stored in separate arrays with
three cells in each dimension: underflow, overflow, and the central part
covered by the regular histogram bins.
\section{Descriptive Statistics}
\label{sec:descriptivestats}
A number of facilities is included in the NPStat package for calculating
standard descriptive sample statistics such as mean, covariance matrix,
{\it etc}. Relevant functions, classes, and templates are described below.
\subsection{Functions and Classes for Use with Point Clouds}
\cname{arrayStats} (header file ``npstat/stat/arrayStats.hh'')  This
function estimates the population mean ($\mu$), standard deviation ($\sigma$),
skewness ($s$),
and kurtosis ($k$) for a onedimensional set of points. A numerically sound
twopass algorithm is used.
% Internally, calculations are performed in long double precision.
The following formulae are implemented ($N$ is the number of points
in the sample):
\begin{equation}
\mu = \frac{1}{N} \sum_{i=0}^{N1} x_i
\label{eq:samplemean}
\end{equation}
\begin{equation}
\sigma = \sqrt{\frac{1}{N  1} \sum_{i=0}^{N1} (x_i  \mu)^2}
\label{eq:samplestdev}
\end{equation}
\begin{equation}
s = \frac{N}{(N  1) (N  2) \sigma^3} \sum_{i=0}^{N1} (x_i  \mu)^3
\end{equation}
\begin{equation}
g_2 = N \frac{\sum_{i=0}^{N1} (x_i  \mu)^4}{\left( \sum_{i=0}^{N1} (x_i  \mu)^2 \right)^2}  3
\label{eq:kurtosisg2}
\end{equation}
\begin{equation}
k = \frac{N  1}{(N  2) (N  3)} ((N + 1) g_2 + 6) + 3
\label{eq:kurtosis}
\end{equation}
Note that the population kurtosis estimate is defined in such a way that its
expectation value for a sample drawn from the Gaussian distribution equals
3 rather than 0. For more details on the origin of these formulae
consult Ref.~\cite{ref:sampleskewkurt}.
\cname{empiricalCdf} (header file ``npstat/stat/StatUtils.hh'')  This
function evaluates the empirical cumulative distribution function (ECDF) for
a sample of points. The points must be sorted
in the increasing order by the user before
the function call is made.
The ECDF is defined as follows. Suppose, the $N$ sorted point values are
$\{x_0, x_1, ..., x_{N1}\}$ and all $x_i$ are distinct ({\it i.e.,} there
are no ties). Then
\begin{equation}
\mbox{ECDF}(x) = \left\{ \begin{array}{ll}
0 & \mbox{if\ } x \le x_0, \\
\frac{1}{N} \left( \frac{xx_i}{x_{i+1}x_i}+i+1/2 \right) & \mbox{if\ } x_i < x \le x_{i+1} \ \mbox{and} \ x < x_{N1}, \\
1 & \mbox{if\ } x \ge x_{N1}.
\end{array}
\right.
\label{eq:ecdf}
\end{equation}
When all sample points are distinct, $\mbox{ECDF}(x)$ is discontinuous
at $x = x_0$ and $x = x_{N1}$ and continuous for all other $x$.
If the sample has more than one point
with the same $x$, let say $x_k$ and
$x_{k+1}$ where $k > 0$ and $k < N  2$, $x_k$ also becomes
a point of discontinuity.
$\mbox{ECDF}(x_k)$ will be set to $\frac{1}{N} (k + 1/2)$, the value
coincident with the function left limit.
\cname{empiricalQuantile} (header file ``npstat/stat/StatUtils.hh'')  This
is the empirical quantile function, $\mbox{EQF}(y)$, which provides
an inverse to $\mbox{ECDF}(x)$. For any $x$ from the interval
$[x_0, x_{N1}]$, $\mbox{EQF}(\mbox{ECDF}(x)) = x$ (up to numerical
roundoff errors). If $x < x_0$ then $\mbox{EQF}(\mbox{ECDF}(x)) = x_0$ and if
$x > x_{N1}$ then $\mbox{EQF}(\mbox{ECDF}(x)) = x_{N1}$.
\cname{MultivariateSumAccumulator} and \cname{MultivariateSumsqAccumulator}
 These classes are intended for calculating means and covariance matrices
for multivariate data samples in a~numerically stable manner. The
classes can be
used inside userimplemented cycles over sets of points or with
\cname{cycleOverRows} and other similar methods of the \cname{AbsNtuple}
class. Two passes over the data
are necessary for calculating the covariance matrix,
first with
\cname{MultivariateSumAccumulator} and second with
\cname{MultivariateSumsqAccumulator}. The multivariate
mean is initially found from
\begin{equation}
\overline{{\bf x}} = \frac{1}{N} \sum_{i=0}^{N1} {\bf x}_i
\end{equation}
and then an estimate of the population covariance matrix is calculated as
\begin{equation}
V = \frac{1}{N  1} \
\sum_{i=0}^{N1} ({\bf x}_i  \overline{{\bf x}}) ({\bf x}_i  \overline{{\bf x}})^T
\label{eq:covar}
\end{equation}
\cname{MultivariateWeightedSumAccumulator} and
\cname{MultivariateWeightedSumsqAccumulator}  These classes are
intended for calculating means and covariance matrices
for multivariate data samples in which points enter with nonnegative
weights:
\begin{equation}
\overline{{\bf x}} = \frac{\sum_{i=0}^{N1} w_i {\bf x}_i}{\sum_{i=0}^{N1} w_i},
\label{eq:weightedmean}
\end{equation}
\begin{equation}
V = \frac{N_{eff}}{N_{eff}  1} \
\frac{\sum_{i=0}^{N1} w_i ({\bf x}_i  \overline{\bf x}) ({\bf x}_i  \overline{\bf x})^T}{\sum_{i=0}^{N1} w_i},
\label{eq:weightedcov}
\end{equation}
where $N_{eff}$ is the effective number of entries in a weighted
sample\footnote{$N_{eff}$ is also known as the Kish's effective sample size. Note that this
estimate of $V$ makes sense only if the weights and point locations are statistically independent
from each other. If the weights are functions of the observed ${\bf x}_i$ then there
appears to be no general estimate that is independent of the weighting function.}:
\begin{equation}
N_{eff} = \frac{\left( \sum_{i=0}^{N1} w_i \right)^2}{\sum_{i=0}^{N1} w_i^2}.
\label{eq:neff}
\end{equation}
Two passes over the data are required, first with
\cname{MultivariateWeightedSumAccumulator} and then with
\cname{MultivariateWeightedSumsqAccumulator}.
\cname{StatAccumulator}  Persistent class which determines
the minimum, the maximum, the mean, and the standard
deviation of a~sample of values using a singlepass algorithm.
It can be used when the twopass calculation
is either impossible or impractical for
some reason. This class can be conveniently employed as
a bin content type for \cname{HistoND} (this turns \cname{HistoND}
into a profile histogram).
\cname{StatAccumulator}
maintains a running average which is updated
after accumulating $2^K$ points, $K = 0, 1, 2, ...$ The
numerical precision of the standard deviation determined
in this manner is not as good as in the twopass method but it
is usually much better than that expected
from a~naive implementation which simply accumulates sample moments about 0
and then evaluates $\sigma^2$ according to
$\frac{N}{N  1} (\overline{{\bf x}^2}  {\overline{\bf x}}^2)$ (this formula
can suffer from a severe subtractive cancellation problem).
\cname{WeightedStatAccumulator}  Similar class for calculating
mean and standard deviation of a~sample of weighted values using
a singlepass algorithm.
\cname{StatAccumulatorArr}  This class provides functionality similar
to an array of \cname{StatAccumulator} objects but with a more convenient
interface for accumulating sample values (cycling over the input values
is performed by the class itself).
\cname{StatAccumulatorPair}  Two \cname{StatAccumulator} objects
augmented by the sum of cross terms which allows for subsequent
calculation of covariance. The covariance is estimated using
a formula which can potentially suffer from subtractive
cancellation\footnote{This problem is partially alleviated
by using long double type for internal calculations.}.
Therefore, this class should not be used to process large samples of points in
which, for one of the variables or for both of them,
the absolute value of the mean can be much larger than the
standard deviation. This class is persistent.
\cname{CrossCovarianceAccumulator}  To accumulate the sums for
covariance matrix calculation according to formula~\ref{eq:covar}
in $d$dimensional space,
one has to keep track of $d (d + 1)/2$ sums of squares and cross terms.
Sometimes only a small part of the complete covariance matrix is of interest.
In this case $d$ can often be split into subspaces of dimensionalities
$d_1$ and $d_2$, $d = d_1 + d_2$, so that the interesting part of
covariance matrix is dimensioned $d_1 \times d_2$ (plus $d$ diagonal terms).
For large values
of $d$, accumulating $d + d_1 \times d_2$ sums instead of $d (d + 1)/2$
can mean big difference in speed and memory consumption, especially
in case $d_1 \ll d_2$ (or $d_1 \gg d_2$).
The \cname{CrossCovarianceAccumulator} allows
its users to do just that: it accumulates cross terms between two
arrays with $d_1$ and $d_2$ elements but not the cross terms between
elements of the same array. The covariances are estimated by a singlepass
method using a~formula which can suffer from subtractive
cancellation\footnotemark[5]. Use with care.
\cname{SampleAccumulator}  This class stores all values
it gets in an internal buffer. Whenever mean, standard deviation,
or some quantile function values are needed for the accumulated sample,
they are calculated from the stored values using numerically sound techniques.
Mean and standard deviation are calculated according to
Eqs.~\ref{eq:samplemean} and~\ref{eq:samplestdev}.
Covariance and correlations can be calculated for a pair of
\cname{SampleAccumulator} objects with the same number of stored elements
by corresponding methods of the class.
Empirical CDF and empirical quantile calculations
are handled by calling \cname{empiricalCdf} and \cname{empiricalQuantile}
functions internally, preceded by data sorting.
This class is persistent and can be used as the bin content template
parameter for \cname{HistoND}.
\cname{WeightedSampleAccumulator}  Similar to SampleAccumulator,
but intended for calculating various statistics for samples
of weighted points.
Kendall's $\tau$ and Spearman's $\rho$
rank correlation coefficients can be estimated
by functions \cname{sampleKendallsTau} and \cname{sampleSpearmansRho},
respectively. These functions are declared in the header files
``npstat/stat/kendallsTau.hh'' and ``npstat/stat/spearmansRho.hh'',
respectively.
\subsection{Functions for Use with Binned Data}
\cname{histoMean} (header file ``npstat/stat/histoStats.hh'')  This
function calculates the histogram center of mass according to
Eq.~\ref{eq:weightedmean} with bin contents
used as weights for bin center coordinates.
Note that the code of this function does not enforce nonnegativity of all bin
values, and the result of~\ref{eq:weightedmean} will not necessarily
make a lot of sense in case negative weights are present.
In order to check that all bins are nonnegative
and that there is at least one positive bin, you can call the \cname{isDensity}
method of the \cname{ArrayND} class on the histogram
bin contents. The \cname{histoMean}
function is standalone and not a member of the \cname{HistoND} class
because Eq.~\ref{eq:weightedmean} is not going to make sense for all
possible bin types.
\cname{histoCovariance} (header file ``npstat/stat/histoStats.hh'')  This
function estimates the population covariance matrix for histogrammed data
according to Eq.~\ref{eq:weightedcov}, with bin contents
used as weights for bin center coordinates.
As for \cname{histoMean}, results
produced by \cname{histoCovariance}
will not make much sense in case negative bin values are present.
\cname{arrayCoordMean} (header file ``npstat/stat/arrayStats.hh'')  Similar
to \cname{histoMean} but the bin values are provided in an \cname{ArrayND}
object and bin locations are specified by additional arguments.
Only uniform bin spacing is supported by this function
(unlike \cname{histoMean} which simply gets its bin centers from
the argument histogram).
\cname{arrayCoordCovariance} (header file ``npstat/stat/arrayStats.hh'')
 Similar to \cname{histoCovariance} but the bin values are provided
in an \cname{ArrayND} object and bin locations are specified by additional
arguments. Uniform binning only.
\cname{arrayShape1D} (header file ``npstat/stat/arrayStats.hh'')  Estimates
population mean, standard deviation, skewness, and kurtosis for
onedimensional histogrammed data with equidistant bins.
Bin values $b_i$ used as weights are provided in an \cname{ArrayND}
object and bin locations $x_i$ are specified by additional arguments.
The mean and the standard deviations squared
are calculated according to Eqs.~\ref{eq:weightedmean}
and~\ref{eq:weightedcov} reduced to one dimension.
The skewness is estimated as
\begin{equation}
s = \frac{N_{eff}^2}{(N_{eff}  1) (N_{eff}  2) \sigma^3} \
\frac{\sum_{i=0}^{N_b1} b_i (x_i  \mu)^3}{\sum_{i=0}^{N_b1} b_i},
\end{equation}
with $N_{eff}$ defined by Eq.~\ref{eq:neff}. The kurtosis is found from Eq.~\ref{eq:kurtosis} in which $N$
is replaced by $N_{eff}$ and, instead of Eq.~\ref{eq:kurtosisg2},
$g_2$ is determined from
\begin{equation}
g_2 = \sum_{i=0}^{N_b1} b_i \ \frac{\sum_{i=0}^{N_b1} b_i (x_i  \mu)^4}{\left( \sum_{i=0}^{N_b1} b_i (x_i  \mu)^2 \right)^2}  3.
\end{equation}
\cname{arrayQuantiles1D} (header file ``npstat/stat/arrayStats.hh'')
 This function treats
onedimensional arrays as histograms and determines the values
of the quantile function for a~given set of cumulative density values
(after normalizing the histogram area to 1). Each bin is considered
to have uniform probability density between its edges. If you need
to perform this type of calculation more than once with the same
array, it will be more efficient to construct a \cname{BinnedDensity1D}
object and then to use its \cname{quantile} method instead of calling
the \cname{arrayQuantiles1D} function multiple times.
\cname{arrayEntropy} (header file ``npstat/stat/arrayStats.hh'')
 This function calculates
\begin{equation}
\label{eq:entropy}
H = \frac{1}{N_b} \sum_{i=0}^{N_b1} b_i \ln b_i
\end{equation}
This formula is appropriate for arrays that tabulate probability density
values on uniform grids. To convert the result into the actual entropy
of the density, multiply it by the volume of the distribution support.
\subsection{ArrayND Projections}
The following set of classes works with \cname{ArrayND} class
and assists in making array ``projections''.
Projections reduce array dimensionality
by calculating certain array properties over projected indices.
For example, one might want to create a onedimensional array, $a$,
from a threedimensional array, $b$, by summing all array elements
with the given second index: $a_j = \sum_{i, k} b_{ijk}$ (this means
that the first and the third indices of the array are ``projected''). This
and other similar operations can be performed by the \cname{project}
method of the \cname{ArrayND} class. What is done with the
array elements when the projection is performed can be specified
by the ``projector'' argument of the \cname{project} method. In particular,
this argument can be an object which performs some statistical
analysis of the projected elements.
\cname{ArrayMaxProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 For each value of the array indices which are not projected,
finds the maximum array element for all possible values of
projected indices.
\cname{ArrayMinProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 Finds the minimum array element for all possible values of
projected indices.
\cname{ArraySumProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 Sums all array values in the iteration over projected indices.
\cname{ArrayMeanProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 Calculates the mean array value over projected indices.
\cname{ArrayMedianProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 Calculates the median array value over projected indices.
\cname{ArrayStdevProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 Calculates the standard deviation of array values encountered
during the iteration over projected indices.
\cname{ArrayRangeProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 Calculates the ``range'' of array values over projected indices.
``Range'' is defined here as the difference between 75\supers{th} and
25\supers{th} percentiles of the sample values divided by the
distance between 75\supers{th} and 25\supers{th} percentiles of
the Gaussian distribution with $\sigma = 1$. Therefore, ``range'' can
be used as a robust estimate of the standard deviation.
All ``projectors'' are derived either from \cname{AbsArrayProjector} class
in case the calculated quantity depends on the array indices
(header file ``npstat/nm/AbsArrayProjector.hh'')
or from \cname{AbsVisitor} class in case it is sufficient
to know just the element values (header file ``npstat/nm/AbsVisitor.hh'').
Naturally, the user
can supply his/her own projector implementations derived
from these base classes for use
with the the \cname{project} method of the \cname{ArrayND} class.
\section{Statistical Distributions}
\label{sec:distros}
A number of univariate and multivariate statistical distributions
is supported by the NPStat package. All implementations of
univariate continuous distributions share a common interface and can be
used interchangeably in a variety of statistical algorithms.
A similar approach has been adopted for univariate discrete and
multivariate continuous distributions.
\subsection{Univariate Continuous Distributions}
All classes which represent univariate continuous distributions
inherit from the \cname{AbsDistribution1D} abstract base class. These classes
must implement methods \cname{density} (probability density function),
\cname{cdf} (cumulative distribution function), \cname{exceedance}
(exceedance probability is just 1 $$ cdf, but direct application
of this formula is often
unacceptable in numerical calculations due to subtractive cancellation),
and \cname{quantile} (the quantile function, inverse of cdf). For
a large number of statistical distributions, the probability density
function looks like $\frac{1}{\sigma}\, p\left( \frac{x  \mu}{\sigma} \right)$, where
$\mu$ and $\sigma$ are the distribution location and scale parameters, respectively.
Distributions of this kind should inherit from the base class
\cname{AbsScalableDistribution1D} which handles proper
scaling and shifting of the argument (this base class itself inherits
from \cname{AbsDistribution1D}, and it is declared in the same
header file ``npstat/stat/AbsDistribution1D.hh'').
Some of the standard univariate continuous distributions implemented
in NPStat are listed in Table~\ref{table:distros1d}. A few special
distributions less frequently encountered in the statistical
literature are described in more detail in the following subsections.
Of course, userdeveloped classes inheriting from \cname{AbsDistribution1D}
or \cname{AbsScalableDistribution1D} can also be employed with all
NPStat functions and classes that take an instance of
\cname{AbsDistribution1D} as one of parameters.
\begin{longtable}{ccc}
% \captionsetup{width=.9\textwidth}
\caption{Continuous univariate distributions included in NPStat.
$P_n(x)$ are the Legendre polynomials.
Parameters $\mu$ and $\sigma$ are not shown for scalable distributions.
When not given explicitly, the normalization constant $\cal{N}$
ensures that
$\int_{\infty}^{\infty} p(x) dx = 1$. Most of the classes
listed in this table are declared in the header file
``npstat/stat/Distributions1D.hh''. If the distribution
is not declared in that header then it has a dedicated header
with the same name, {\it e.g.}, ``npstat/stat/TruncatedDistribution1D.hh''
for \cname{TruncatedDistribution1D}.}
\label{table:distros1d} \\
\hline \multicolumn{1}{c}{Class Name} &
\multicolumn{1}{c}{$p(x)$} &
\multicolumn{1}{c}{Scalable?} \\ \hline \hline
\endfirsthead
\multicolumn{3}{c}%
{\tablename\ \thetable{}  continued from the previous page} \\
\hline \multicolumn{1}{c}{Class Name} &
\multicolumn{1}{c}{$p(x)$} &
\multicolumn{1}{c}{Scalable?} \\ \hline \hline
\endhead
\multicolumn{3}{r}{{Continued on the next page}} \\ \hline
\endfoot
\hline
\endlastfoot
\cname{Uniform1D} & $I(0 \le x \le 1)$ & yes \\ \hline
\cname{IsoscelesTriangle1D} & $(1  x) I(1 \le x \le 1)$ & yes \\ \hline
\cname{Exponential1D} & $e^{x} I(x \ge 0)$ & yes \\ \hline
\cname{Quadratic1D} & $(1 + a P_1(2 x  1) + b P_2(2 x  1)) \,I(0 \le x \le 1)$ & yes \\ \hline
\cname{LogQuadratic1D} & ${\cal{N}} \exp (a P_1(2 x  1) + b P_2(2 x  1)) \,I(0 \le x \le 1)$ & yes \\ \hline
\cname{Gauss1D} & $\frac{1}{\sqrt{2 \pi}} e^{x^2/2}$ & yes \\ \hline
\cname{GaussianMixture1D} & $\frac{1}{\sqrt{2 \pi}} \sum_i \frac{w_i}{\sigma_i} e^{(x\mu_i)^2/(2 \sigma_i^2)}, \ \mbox{where} \ \sum_i w_i = 1$ & yes \\ \hline
\cname{TruncatedGauss1D} & $\frac{\cal{N}}{\sqrt{2 \pi}} e^{x^2/2} \, I(n_{\sigma} \le x \le n_{\sigma})$ & yes \\ \hline
\cname{MirroredGauss1D} & \begin{minipage}{0.52\linewidth}
\vskip3mm \begin{eqnarray*}
I(0 \le x \le 1)\, \frac{1}{\sqrt{2 \pi} s} \sum_{i=\infty}^{\infty} &
\left[ e^{(x  m + 2\,i)^2/(2 s^2)} + \right.\\
& \left.e^{(x + m + 2\,i)^2/(2 s^2)}\right],
\end{eqnarray*} \vskip1mm
where $0 \le m \le 1$ and $s > 0$ \newline \vskip8mm $\mbox{}$ \end{minipage} & yes \\ \hline
\cname{BifurcatedGauss1D} & \begin{minipage}{0.52\linewidth}
\vskip2mm \begin{eqnarray*}
{\cal{N}} & \left[e^{x^2/(2 \alpha)} I(n_{\sigma,L} \le x < 0) \, + \right.\\
& \left. e^{x^2/(2 (1  \alpha))} I(0 \le x < n_{\sigma,R})\right]
\end{eqnarray*} \vskip0mm \end{minipage} & yes \\ \hline
\cname{UGaussConvolution1D} & $\frac{1}{\sqrt{2 \pi} (b  a)} \, e^{x^2/2} * I(a \le x \le b)$ & yes \\ \hline
\cname{SymmetricBeta1D} & ${\cal{N}} \, (1  x^2)^p \, I(1 < x < 1)$ & yes \\ \hline
\cname{Beta1D} & ${\cal{N}} \, x^{\alpha  1} (1  x)^{\beta  1} \, I(0 < x < 1)$ & yes \\ \hline
\cname{Gamma1D} & ${\cal{N}} \, x^{\alpha  1} e^{x} \, I(x > 0)$ & yes \\ \hline
\cname{Pareto1D} & $\alpha x^{\alpha  1} I(x \ge 1)$ & yes \\ \hline
\cname{Huber1D} & ${\cal{N}} \, [e^{x^2/2} I(x \le a) + e^{\, a (a/2  x)} (1  I(x \le a))] $ & yes \\ \hline
\cname{Cauchy1D} & $\pi^{1} (1 + x^2)^{1}$ & yes \\ \hline
\cname{StudentsT1D} & ${\cal{N}} \, (1 + x^2/N_{dof})^{(N_{dof} + 1)/2}$ & yes \\ \hline
\cname{Moyal1D} & $\frac{1}{\sqrt{2 \pi}} e^{(x + e^{x})/2}$ & yes \\ \hline
\cname{Logistic1D} & $e^{x} \,(1 + e^{x})^{2}$ & yes \\ \hline
\cname{Tabulated1D} & \begin{minipage}{0.52\linewidth}
\vskip1mm
Defined by a table of equidistant values on
the [0, 1] interval, interpolated by
a~polynomial (up to cubic). The first table
point is at $x = 0$ and the last is at $x = 1$. \newline \vskip8mm $\mbox{}$
% Normalization is computed automatically.
\end{minipage} & yes \\ \hline
\cname{BinnedDensity1D} & \begin{minipage}{0.52\linewidth}
\vskip1mm
Defined by a table of $N$ equidistant values
on the [0, 1] interval, with optional linear
interpolation. The first table point is at
$x = 1/(2 N)$ and the last is at $x = 1  1/(2 N)$.
Useful for converting 1d histograms into distributions.
\end{minipage} & yes \\ \hline
\cname{QuantileTable1D} & \begin{minipage}{0.52\linewidth}
\vskip1mm
Defined by a table of $N$ equidistant quantile function values
on the [0, 1] interval with linear interpolation between these values
(so that density looks like a histogram with equal area bins).
The first table point is at
$x = 1/(2 N)$ and the last is at $x = 1  1/(2 N)$.
Useful for converting data samples into distributions
by sampling empirical quantiles.
\end{minipage} & yes \\ \hline
\cname{DistributionMix1D} & $\sum_i w_i p_i(x), \ \mbox{where} \ \sum_i w_i = 1$ & no \\ \hline
\cname{DeltaMixture1D} & $\sum_i w_i \delta(x  x_i), \ \mbox{where} \ \sum_i w_i = 1$ & yes \\ \hline
\cname{LocationScaleFamily1D} & \begin{minipage}{0.52\linewidth}
\vskip1mm
$\frac{1}{s} p\sub{other}\left(\frac{x  m}{s}\right)$, useful for converting nonscalable
1d distributions into scalable.
\end{minipage} & yes \\ \hline
\cname{RatioOfNormals} & \begin{minipage}{0.52\linewidth}
\vskip1mm
Ratio of two correlated normal random variables, as described in~\cite{ref:ratioofnormals}.
\end{minipage} & no \\ \hline
\cname{LeftCensoredDistribution} & $f \, p\sub{other}(x) + (1  f) \, \delta(x  x_{\infty})$ & no \\ \hline
\cname{RightCensoredDistribution} & $f \, p\sub{other}(x) + (1  f) \, \delta(x  x_{+\infty})$ & no \\ \hline
\cname{TruncatedDistribution1D} & ${\cal{N}} \, p\sub{other}(x) \, I(x\sub{min} \le x \le x\sub{max})$ & no \\ \hline
\cname{TransformedDistribution1D} & $p\sub{other}(y) \left \frac{dy}{dx} \right$ for monotonous $y(x)$ & no \\
\end{longtable}
\subsection{Johnson Curves}
The density functions of the Johnson
distributions~\cite{ref:johnson, ref:johnsonbook, ref:hahnshap}
are defined as follows:
\begin{equation}
S_{U}\ \mbox{(unbounded)}:\ \ p(x) = \frac{\delta}{\lambda \sqrt{2 \pi \left(1 +
\left(\frac{x  \xi}{\lambda}\right)^{2}\right)}} \,e^{\frac{1}{2}
\left(\gamma + \delta\,\mbox{\scriptsize sinh}^{1}\left(\frac{x\xi}
{\lambda}\right)\right)^{2}}
\label{eq:johnsu}
\end{equation}
\begin{equation}
S_{B}\ \mbox{(bounded)}:\ \ \ p(x) = \frac{\delta}{\lambda \sqrt{2 \pi}
\left(\frac{x  \xi}{\lambda}\right)\left(1  \frac{x  \xi}{\lambda}\right)}
\,e^{\frac{1}{2}\left( \gamma + \delta \log \left( \frac{x\xi}
{\xi + \lambda  x}\right) \right)^{2}} I(\xi < x < \xi + \lambda)
\label{eq:johnsb}
\end{equation}
They are related to the normal distribution, $N(\mu, \sigma)$,
by simple variable
transformations. Variable $z$ distributed according to $N(0, 1)$
can be
obtained from Johnson's variates $x$ by transformation
$z = \gamma + \delta f\left(\frac{x\xi}{\lambda}\right)$:
\begin{displaymath}
\begin{array}{rlll}
f(y) = & \mbox{sinh}^{1}(y) & & S_{U}\ \mbox{curves}\\
f(y) = & \log \left(\frac{y}{1y}\right) & (0 < y < 1) & S_{B}\ \mbox{curves}
\end{array}
\end{displaymath}
Both densities become arbitrarily close to the lognormal density (for which
variable $z = \gamma + \delta \log (x  \xi)$ is normally distributed)
in the limit $\gamma \rightarrow \infty$
and to the normal distribution in the limit $\delta \rightarrow \infty$
($\xi$ and $\lambda$ have to be adjusted accordingly). Johnson's
parameterization of the lognormal density is
\begin{equation}
p(x) = \frac{\delta}{\sqrt{2 \pi} (x  \xi)} e^{\frac{1}{2} (\gamma + \delta \log (x  \xi))^2} I(x > \xi)
\end{equation}
Together with their limiting cases, Johnson's $S_{U}$ and $S_{B}$
distributions attain {\it all possible values of skewness and kurtosis.}
Unfortunately, parameters $\xi, \lambda, \gamma$, and $\delta$
of $S_{U}$ and $S_{B}$ in
Eqs.~\ref{eq:johnsu} and~\ref{eq:johnsb} have no direct
relation to each other.
Crossing the lognormal boundary requires a discontinuous change in
the parameter values which is
rather inconvenient for practical data fitting purposes. This
problem can be alleviated by reparameterizing the functions
in terms of mean $\mu$, standard deviation $\sigma$,
skewness $s$, and kurtosis $k$, so that the corresponding
curve type and the original parameters $\xi, \lambda, \gamma, \delta$
are determined numerically. Examples of Johnson's density functions
are shown in Fig~\ref{johns_exa}.
\begin{figure}[t]
\begin{center}
\includegraphics[width=0.6\textwidth]{johnson_examples.pdf}
\caption{ Black: $s = 0, ~k = 3$, Gaussian. Red: $s = 1.5, ~k = 10$, $S_{U}$.
Blue: $s = 0.5, ~ k = 2$, $S_{B}$.
For all three curves, the mean is 0 and the standard deviation is 1.}
\label{johns_exa}
\end{center}
\end{figure}
Johnson's $S_{U}$ and $S_{B}$ curves are implemented in NPStat with classes
\cname{JohnsonSu} and \cname{JohnsonSb}, respectively (header file
``npstat/stat/JohnsonCurves.hh''). Both of these
distributions are parameterized by $\mu$, $\sigma$, $s$, and $k$,
with an automatic internal conversion into $\xi, \lambda, \gamma, \delta$.
An original algorithm was developed to perform this conversion,
based in part on ideas from Refs.~\cite{ref:draper, ref:hill}.
The lognormal distribution parameterized by $\mu$, $\sigma$, and $s$
is implemented by the \cname{LogNormal} class (header file
``npstat/stat/Distributions1D.hh''). The class \cname{JohnsonSystem}
(header file ``npstat/stat/JohnsonCurves.hh'') can be used when automatic
switching between $S_{U}$, $S_{B}$,
lognormal, and Gaussian distributions is desired.
\subsection{Composite Distributions}
Composite distributions are built out of two or more
component distributions. One of these component distributions is
arbitrary while
all others must have a density supported on the interval
[0, 1]. Suppose, $G_k(x), \ k = 1, 2, 3, ...$ are cumulative
distribution functions with corresponding densities $g_i(x)$
proportional to $I(0 \le x \le 1)$.
Then, if $H(x)$ is a cumulative distribution function
with density $h(x)$, $F_1(x) = G_1(H(x))$ is also a cumulative
distribution function with density $f_1(x) = h(x) \, g_1(H(x))$.
Similarly, $F_2(x) = G_2(F_1(x))$ is a cumulative distribution with density
$f_2(x) = f_1(x) g_2(F_1(x)) = h(x) \, g_1(H(x)) g_2(G_1(H(x)))$.
We can now construct $F_3(x) = G_3(F_2(x))$ and so on. This sequence can
be terminated after an arbitrary number of steps.
Note that $f_1(x) = h(x)$ in case $g_1(x)$ is a uniform
probability density. Small deviations from uniformity in $g_1(x)$
will lead to corresponding small changes in $f_1(x)$.
The resulting density model can be quite flexible, even if
the component distributions $H(x)$ and $G_1(x)$ are
simple. Therefore, data samples with complicated sets
of features can be modeled as follows: construct
an approximate model $h(x)$ first, even if it does not fit
the sample quite right.
Then transform the data points $x_i$ to the [0, 1] interval
according to $y_i = H(x_i)$. The density of
points $y_i$\footnote{This density is called ``relative density''~\cite{ref:handcock}
or ``comparison density''~\cite{ref:thas} in the statistical literature. Note
that $h(x)$ should be selected in such a way that the ratio between
the unknown population density of the sample under study and $h(x)$ should
be bounded for all $x$. If it is not, the relative density
will usually be unbounded, and its subsequent representation
by polynomials or logpolynomials will not lead to a consistent
estimate. Johnson curves often work reasonably well as $h(x)$.}
can now be
fitted to another parametric distribution (including composite one)
or it can be modeled by nonparametric techniques~\cite{ref:kdetransform}.
Due to the elimination of the boundary bias, the LOrPE density
estimation method described in Section~\ref{sec:lorpe}
becomes especially useful in the latter approach.
Once the appropriate $G(y)$ is constructed (parametrically
or not), the resulting composite density will provide
a good fit to the original set of points $x_i$\footnote{There is,
of course, a close connection between this density modeling
approach and a number of goodnessoffit techniques based on
the comparison of the empirical cdf
with the cdf of the fitted density. For example, using
Legendre polynomials to model $\log (g_1(x))$, as in the
\cname{LogQuadratic1D} density, directly
improves the test criterion used in the Neyman's
smooth test for goodnessoffit~\cite{ref:thas, ref:smoothtest}.}.
The composite distributions are implemented in NPStat
with the \cname{CompositeDistribution1D} class.
One composite distribution can be used to construct
another, thus allowing for composition chains of
arbitrary length, as described at the beginning of
this subsection. Several prebuilt distributions of this type
are included in the NPStat package (they are declared in the header file
``npstat/stat/CompositeDistros1D.hh''). Flexible
models potentially capable of fitting wide varieties
of univariate data samples are implemented by the \cname{JohnsonLadder}
and \cname{BinnedCompositeJohnson} classes. In both of these,
Johnson curves are used as $H(x)$. \cname{JohnsonLadder}
takes an arbitrarily long sequence of parametric \cname{LogQuadratic1D}
distributions for $G_k(x)$ while \cname{BinnedCompositeJohnson}
is using a single nonparametric \cname{BinnedDensity1D} as $G_1(x)$.
\subsection{Univariate Discrete Distributions}
Classes which represent univariate discrete distributions
inherit from the \cname{AbsDiscreteDistribution1D} abstract base class.
The interface defined by this base class differs in a number of ways
from the \cname{AbsDistribution1D} interface.
Instead of the method \cname{density}
used with arguments of type ``double'',
discrete distributions have the method \cname{probability} defined for
the arguments of type ``long int''.
The methods \cname{cdf} and \cname{exceedance}
have the same signatures as corresponding methods of continuous distributions,
but the \cname{quantile} function is returning long integers. The
method \cname{random} generates long integers as well.
Discrete univariate distributions which can be trivially shifted should
inherit from the \cname{ShiftableDiscreteDistribution1D} base class which
handles the shift operation. There is, however, no operation analogous
to scaling of continuous distributions.
Univariate discrete distributions implemented
in NPStat are listed in Table~\ref{table:discrdistros1d}.
\begin{table}[ht!]
\caption{Discrete univariate distributions included in NPStat.
The location parameter is not shown explicitly for shiftable distributions.
Classes listed in this table are declared in the header file
``npstat/stat/DiscreteDistributions1D.hh''.}
\label{table:discrdistros1d}
\begin{center}
\noindent\begin{tabular}{ccc} \hline
Class Name & $p(n)$ & Shiftable? \\ \hline\hline
\cname{DiscreteTabulated1D} & \begin{minipage}{0.52\linewidth}
\vskip1mm
Defined by a table of probability values.
Normalization is computed automatically.
\end{minipage} & yes \\ \hline
\cname{Poisson1D} & $\frac{\lambda^n}{n!} e^{\lambda}$ & no \\ \hline
\end{tabular}
\end{center}
\end{table}
Mixtures of discrete univariate distributions with finite support
can be implemented using the function \cname{pooledDiscreteTabulated1D}
(header file ``npstat/stat/DiscreteDistributions1D.hh'').
\subsection{Multivariate Continuous Distributions}
All classes which represent multivariate continuous distributions
inherit from the \cname{AbsDistributionND} abstract base class.
These classes must implement methods \cname{density} (probability
density function) and \cname{unitMap} (mapping from the unit
$d$dimensional cube, $U_d$, into the density support region).
Densities that can be shifted and scaled in each coordinate separately
should be derived from the \cname{AbsScalableDistributionND} base class
(which itself inherits from \cname{AbsDistributionND}).
Classes representing densities that look like $p({\bf x}) = \prod q(x_i)$,
where $q(x)$ is some onedimensional density,
should be derived from the \cname{HomogeneousProductDistroND} base class
(the three base classes just mentioned
are declared in the ``npstat/stat/AbsDistributionND.hh'' header file).
Simple classes inheriting from \cname{AbsDistributionND}
are listed in Table~\ref{table:distrosnd}.
\begin{table}[ht!]
\caption{Continuous multivariate distributions included in NPStat.
These distributions are predominantly intended for use as multivariate
density estimation kernels.
Shifts and scale factors are not shown for scalable distributions.
Here, ``scalability'' means
the ability to adjust the shift and scale parameters in each
dimension, not the complete bandwidth matrix.
When not given explicitly, the normalization constant $\cal{N}$
ensures that
$\int p({\bf x}) d{\bf x} = 1$. All of these distributions
are declared in the header file ``npstat/stat/DistributionsND.hh''
with exception of \cname{ScalableGaussND} which has its own header.}
\label{table:distrosnd}
\begin{center}
\noindent\begin{tabular}{ccc} \hline
Class Name & $p({\bf x})$ & Scalable? \\ \hline\hline
\cname{ProductDistributionND} & $\prod_{i=1}^{d} p_i(x_i)$ &
\begin{minipage}{0.13\linewidth}
depends on components
\end{minipage} \\ \hline
\cname{UniformND} & $\prod_{i=1}^{d} I(0 \le x_i \le 1)$ & yes \\ \hline
\cname{ScalableGaussND} & $(2 \pi)^{d/2} e^{{\bf x}^2/2}$ & yes \\ \hline
\cname{ProductSymmetricBetaND} & ${\cal{N}} \prod_{i=1}^{d} (1  x_i^2)^p I(1 < x_i < 1)$ & yes \\ \hline
\cname{ScalableSymmetricBetaND} & ${\cal{N}} (1  {\bf x}^2)^p I({\bf x} < 1)$ & yes \\ \hline
\cname{ScalableHuberND} & ${\cal{N}} \, [e^{{\bf x}^2/2} I({\bf x} \le a) + e^{\, a (a/2  {\bf x})} (1  I({\bf x} \le a))]$ & yes \\ \hline
\cname{RadialProfileND} & \begin{minipage}{0.52\linewidth}
\vskip1mm
Arbitrary centrallysymmetric density.
Defined by its radial profile:
a table of equidistant values on
the [0, 1] interval, interpolated by
a~polynomial (up to cubic). The first table
point is at ${\bf x} = 0$ and the last is at
${\bf x} = 1$. For ${\bf x} > 1$ the density is 0.
Normalization is computed automatically.
\end{minipage} & yes \\ \hline
\cname{BinnedDensityND} & \begin{minipage}{0.52\linewidth}
\vskip1mm
Defined by a table of values
on $U_d$, equidistant in each dimension,
with optional multilinear
interpolation. In each dimension,
the first table point is at
$x_i = 1/(2 N_i)$ and the last is at
$x_i = 1  1/(2 N_i)$.
Useful for converting multivariate
histograms into distributions.
\end{minipage}
& yes \\ \hline
\end{tabular}
\end{center}
\end{table}
\subsection{Copulas}
For any continuous multivariate density $p({\bf x}{\bf a})$
in a $d$dimensional space $X$ of random variables ${\bf x} \in X$
depending on a vector of parameters ${\bf a}$,
we define $d$ marginal densities $p_i(x_i{\bf a})$, $i = 1, \ldots, d$ by
$$
p_i(x_i{\bf a}) \equiv \int p(x_1, \ldots, x_d {\bf a}) \mathop{\prod_{j=1}^d}_{j \ne i} d x_j
$$
with their corresponding cumulative distribution functions
$$
F_i(x_i{\bf a}) \equiv \int_{\infty}^{x_i} p_i(\tau{\bf a}) d \tau .
$$
For each point ${\bf x} \in X$, there is a corresponding
point ${\bf y}$ in a unit $d$dimensional
cube $U_d$ such that $y_i(x_i) \equiv F_i(x_i{\bf a})$, $i = 1, \ldots, d$. The {\it copula density}
is defined on $U_d$ by
\begin{equation}
c({\bf y}({\bf x}){\bf a}) \equiv \frac{p({\bf x}{\bf a})}{\prod_{i=1}^{d} p_i(x_i{\bf a})}.
\label{eq:copuladef}
\end{equation}
Copula density (as well as its corresponding multivariate
distribution function $C({\bf y}{\bf a})$, or just {\it copula})
contains all information
about mutual dependence of individual variables $x_i$.
It can be shown that all copula marginals are uniform and, conversely,
that any distribution on $U_d$ whose marginals are uniform
is a copula~\cite{ref:copulabook}.
Naturally, when the copula and the marginals of some multivariate density
are known, the density itself is expressed by
\begin{equation}
p({\bf x}{\bf a}) = c({\bf y}({\bf x}){\bf a}) \prod_{i=1}^{d} p_i(x_i{\bf a}).
\end{equation}
Note that $c({\bf y}{\bf a}) = 1$ for all ${\bf y}$ if and only if all $x_i$
are independent and the density $p({\bf x}{\bf a})$ is fully
factorizable.
The NPStat package allows its users to model multivariate continuous
distributions using copula and marginals with the aid of
\cname{CompositeDistributionND} class. An object of this class is
normally constructed out of userprovided copula and marginals.
\cname{CompositeDistributionND} object can also be constructed
from histogram bins. Several standard copulas are
implemented: Gaussian, Student's$t$, and
FarlieGumbelMorgenstern~\cite{ref:copulabook}.
The corresponding class names are \cname{GaussianCopula},
\cname{TCopula}, and \cname{FGMCopula} (these classes are
declared in the header file ``npstat/stat/Copulas.hh'').
Empirical multivariate copulas densities can be constructed
as follows. Let's assume that there are no coincident
point coordinates in each dataset variable. We can sort all
$N$ elements of the data set in the increasing order in each coordinate
separately and assign to each data point $i$ a multiindex
$\{m_{i0}, ..., m_{id}\}$. For each dimension $k$, $m_{ik}$
represents the number of the point $i$ in the sequence ordered
by the dimension $k$ coordinate, so that
$0 \le m_{ik} < N$. The empirical copula density, $\mbox{ECD}({\bf x})$,
is then defined by
\begin{equation}
\mbox{ECD}({\bf x}) = \sum_{i=0}^{N1} \prod_{k=0}^{d1} \delta\left(x_k  \frac{m_{ik} + 1/2}{N}\right),
\end{equation}
where $\delta(x)$ is the Dirac delta function. To get a better idea about
locations of these delta functions, think of
an $N \times N \times ... \times N$ uniform grid in $d$ dimensions on
which $N$ points are placed in such a way that no two
points share the same coordinate in any of the dimensions.
In two dimensions, this is like the placement of chess pieces
in the eight queens puzzle~\cite{ref:eightqueens}
which is simplified to use rooks instead of queens~\cite{ref:eightrooks}.
In the NPStat package, empirical copula densities can be approximated by
histograms defined on $U_d$. Construction of empirical copula histograms
can be performed by functions \cname{empiricalCopulaHisto}
(header file ``npstat/stat/empiricalCopulaHisto.hh'') and
\cname{empiricalCopulaDensity}
(header file ``npstat/stat/empiricalCopula.hh''). The integrated empirical copulas can be
constructed on uniform grids by the \cname{calculateEmpiricalCopula}
function (header ``npstat/stat/empiricalCopula.hh'').
The Spearman's rank correlation coefficient, $\rho$, can
be estimated from twodimensional empirical copulas using functions
\cname{spearmansRhoFromCopula} and \cname{spearmansRhoFromCopulaDensity}
declared in the header file ``npstat/stat/spearmansRho.hh''.
These functions evaluate the following integrals numerically:
\begin{align}
\rho &= 12 \int_0^1 \int_0^1 C({\bf y}{\bf a}) dy_0 dy_1  3 & \mbox{\ } & \mbox{used by \textsf{spearmansRhoFromCopula}} \\
\rho &= 12 \int_0^1 \int_0^1 c({\bf y}{\bf a}) y_0 y_1 dy_0 dy_1  3 & \mbox{\ } & \mbox{used by \textsf{spearmansRhoFromCopulaDensity}}
\end{align}
While these two formulas are identical mathematically~\cite{ref:copulabook},
the approximations used
in numerical integration and differentiation will usually lead to slightly
different results returned by these functions for some copula and its
corresponding density.
The Kendall's rank correlation coefficient, $\tau$,
can be estimated from the empirical copulas using the
function \cname{kendallsTauFromCopula} declared in
the header ``npstat/stat/kendallsTau.hh''.
This function evaluates the following formula numerically:
\begin{equation}
\tau = 4 \int_0^1 \int_0^1 C({\bf y}{\bf a}) c({\bf y}{\bf a}) dy_0 dy_1  1.
\end{equation}
The mutual information between variables can be estimated from
copula densities represented on uniform grids according
to Eq.~\ref{eq:entropy}\footnote{Mutual information is
simply the negative of the copula entropy~\cite{ref:copulaentropy}.}.
Decomposition of multivariate densities into the marginals and the
copula is useful not only for a subsequent analysis of mutual
dependencies of the variates but also for implementing nonparametric
density interpolation, as described in the next section.
\section{Nonparametric Interpolation of Densities}
\label{sec:densityinterpolation}
There are numerous problems in High Energy Physics in which construction of some
probability density requires extensive simulations and, due to the CPU
time limitations, can only be performed for a~limited number of
parameter settings. This is typical, for example, for ``mass
templates'' which depend on such parameters as the pole mass of the
particle under study, sample background fraction, detector jet energy
scale, {\it etc}. It is often desirable to have the capability to
evaluate such a density for arbitrary parameter values.
It is sometimes possible to address this problem by postulating
an explicit parametric
model and fitting that model to the sets of simulated distributions.
However, complex dependence of the distribution shapes on the parameter
values often leads to infeasibility of this approach. Then it becomes
necessary to interpolate the densities without postulating a concrete
model.
\subsection{Interpolation of Univariate Densities using Quantile Functions}
\label{sec:interpolatebyquant}
As it was shown in~\cite{ref:read}, a general
interpolation of onedimensional distributions which
leads to very naturally looking results
can be achieved by
interpolating the {\it quantile function} (defined as the
inverse of the cumulative distribution function). While study~\cite{ref:read}
considers linear interpolation in onedimensional parameter space, it is obvious that
similar weighted average interpolation can be easily constructed in
multivariate parameter settings and with higher order interpolation
schemes. In general, the interpolated quantile function for
an~arbitrary parameter value ${\bf a}$ is expressed by
\begin{equation}
\label{eq:quantileinterp1d}
q(y{\bf a}) = \sum_{j=1}^m w_j q(y{\bf a}_j),
\end{equation}
where the weighted quantile functions are summed at the $m$ ``nearby''
parameter settings ${\bf a}_j$ for which the distribution was
explicitly constructed. The weights $w_j$ are normalized by
$\sum_{j=1}^m w_j = 1$. Their precise values depend on the location
of ${\bf a}$ w.r.t. nearby ${\bf a}_j$ and on the interpolation scheme used.
Simple highorder interpolation schemes can be constructed
in onedimensional parameter space by using Lagrange interpolating
polynomials
to determine the weights\footnote{To determine interpolated value of a function using
Lagrange interpolating polynomials, weights are assigned to function
values at a set of nearby points according to a rule
published by Lagrange in 1795. The weights depend only on the
abscissae of the interpolated points and on the coordinate
at which the function value is evaluated but not on the
interpolated function values.}. In $r$dimensional parameter
space, rectangular grids can be utilized with multilinear or
multicubic interpolation. For
example, in the case of multilinear
interpolation, the weights can be calculated as follows:
\begin{thinlist}
\item Find the hyperrectangular parameter grid
cell inside which the value of ${\bf a}$ falls.
\item Shift and scale this cell so that it becomes a hypercube
with diagonal vertices at $(0, 0, ..., 0)$ and $(1, 1, ..., 1)$.
\item Let's designate the shifted and scaled value of ${\bf a}$ by ${\bf z}$,
with components $(z_1, z_2, ..., z_r)$.
If we were to interpolate towards ${\bf z}$
a scalar function $f$ defined at the
vertices with coordinates $(c_1, c_2, \ldots, c_r)$,
where all $c_k$ values are now either 0 or 1,
then we would use the formula
\begin{equation}
\label{eq:multilinear}
f(z_1, z_2, ..., z_r) = \sum_{\substack{
c_1 \in \,\{0, 1\}\\
c_2 \in \,\{0, 1\}\\
\cdot \cdot \cdot \cdot \cdot \cdot \cdot \cdot \cdot \\
c_r \in \,\{0, 1\}\\
}} f(c_1, c_2, ..., c_r) \prod_{k=1}^r z_k^{c_k} (1  z_k)^{1  c_k}
\end{equation}
which is obviously linear in every $z_k$ and has
correct function values when evaluated at the vertices.
In complete analogy, we define the weights for the quantile functions
constructed at the vertices with coordinates $(c_1, c_2, \ldots, c_r)$ to be
$w(c_1, c_2, \ldots, c_r) = \prod_{k=1}^r z_k^{c_k} (1  z_k)^{1  c_k}$.
Naturally, there are $m = 2^r$ weights total.
\end{thinlist}
The quantile interpolation of univariate distributions is implemented
in the NPStat package with the \cname{InterpolatedDistribution1D} class.
A collection of quantile functions is assembled incrementally together
with their weights (formula for calculating the weights is to be supplied
+with their weights (formula for calculating the weights has to be supplied
by the user).
For calculating the interpolated density at point $x$,
the equation $x = q(y{\bf a})$,
with $q(y{\bf a})$ from~\ref{eq:quantileinterp1d},
is solved numerically for $y$. The density is then evaluated by
numerically differentiating the interpolated quantile function:
$p(x{\bf a}) = \left( \frac{\partial q(y{\bf a})}{\partial y} \right)^{1}$.
A class with a simpler interface implementing linear weight assignments
between nearby points in a 1d parameter space is called \cname{InterpolatedDistro1D1P}.

Less reliable but computationally faster linear interpolation of densities (instead of
quantile functions) is implemented for 1d parameter spaces by the
class \cname{VerticallyInterpolatedDistro1D1P}. This class allows
the user to perform interpolation in a transformed space (for example,
the transform can center all distributions at 0 and rescale them to
the same width) in order to alleviate deficiencies of ``vertical''
interpolation.
+Direct interpolation of univariate density functions, also known
+as ``vertical interpolation'', is implemented by the
+\cname{VerticallyInterpolatedDistribution1D} class.
+A class with a simpler interface implementing automatic linear weight assignments
+between nearby points in a 1d parameter space is called \cname{InterpolatedDistro1D1P}.
+This class can be used to interpolate either quantile functions
+or densities, depending on a switch. A~similar class which interpolates
+univariate densities on a rectangular grid in a miltidimensional parameter
+space is called \cname{InterpolatedDistro1DNP}.
\subsection{Interpolation of Multivariate Densities}
\label{sec:interpolatemultivar}
The procedure described in the previous section can be
generalized to interpolate multivariate distributions.
Let $p({\bf x}{\bf a})$ be a multivariate probability density
in a $d$dimensional space $X$ of random variables ${\bf x} \in X$.
${\bf a}$ is a vector of parameters, and
$\int_{X} p({\bf x}{\bf a}) d {\bf x} = 1$
for every ${\bf a}$\footnote{Compared to interpolation of
arbitrary functions, preserving this normalization is one of the
major complications in interpolating multivariate densities.}.
For a~``wellbehaved'' density (Riemannintegrable, {\it etc}),
it is always possible to construct a onetoone
mapping from the space $X$
into the unit $d$dimensional cube, $U_d$, using a sequence of onedimensional
conditional cumulative distribution functions. These functions are defined
as follows:
$$
\begin{array}{c}
F_{1}(x_1x_2, x_3, \ldots, x_d, {\bf a}) \equiv \int_{\infty}^{x_1} p(z_1, x_2, x_3, \ldots, x_d  {\bf a}) d z_1 /
\int_{\infty}^{\infty} p(z_1, x_2, x_3, \ldots, x_d  {\bf a}) d z_1
\\
F_{2}(x_2  x_3, \ldots, x_d, {\bf a}) \equiv \int_{\infty}^{x_2} F_{1}(\infty  z_2, x_3, \ldots, x_d, {\bf a}) d z_2 /
\int_{\infty}^{\infty} F_{1}(\infty  z_2, x_3, \ldots, x_d, {\bf a}) d z_2
\\
F_{3}(x_3  \ldots, x_d, {\bf a}) \equiv \int_{\infty}^{x_3} F_{2}(\infty  z_3, \ldots, x_d, {\bf a}) d z_3 /
\int_{\infty}^{\infty} F_{2}(\infty  z_3, \ldots, x_d, {\bf a}) d z_3
\\
.\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .
\\
F_{d}(x_d  {\bf a}) \equiv \int_{\infty}^{x_d} F_{d1}(\infty  z_d, {\bf a}) d z_d /
\int_{\infty}^{\infty} F_{d1}(\infty  z_d, {\bf a}) d z_d
\end{array}
$$
Naturally, $F_{k}(\infty  x_{k+1}, \ldots, x_d, {\bf a})$ is just renormalized $p({\bf x}{\bf a})$
in which the first $k$ components of ${\bf x}$ are integrated out ({\it i.e.,} marginalized).
The mapping from ${\bf x} \in X$ into ${\bf y} \in U_d$ is defined by $y_i = F_i(x_i  \ldots), \ i = 1, \ldots, d$. In terms of conditional cumulative distribution functions,
$$
p({\bf x}{\bf a}) = \prod_{i=1}^{d}{\frac{\partial F_i(x_i  \ldots)}{\partial x_i}}.
$$
This method is not unique and other mappings
from $X$ to $U_d$ are possible. What makes this particular construction useful
is that the inverse mapping, from $U_d$ into $X$, can be
easily constructed as well: we simply solve the equations
\begin{equation}
\label{eq:invertcondcdf}
y_i = F_i(x_i  \ldots)
\end{equation}
in the reverse order, starting from dimension $d$ and going back
to dimension $1$. Each equation in this sequence {\it has only one unknown}
and therefore it can be efficiently solved numerically
(and sometimes algebraically) by a variety
of standard root finding techniques. The solutions of these equations,
$x_i = q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a}) \equiv F_i^{1}(y_i  \ldots)$, are
the {\it conditional quantile functions} (CQFs). Note that
\begin{equation}
\label{eq:invderiv}
p({\bf x}{\bf a}) = \left( \prod_{\ i=1}^{d}{\frac{\partial q_i(y_i  \ldots)}{\partial y_i}} \right)^{1}.
\end{equation}
Now, if the CQFs are known for some parameter values
${\bf a}_1$ and ${\bf a}_2$,
interpolation towards ${\bf a} = (1  \lambda) {\bf a}_1 + \lambda {\bf a}_2$ is made
in the same manner as described in Section~\ref{sec:interpolatebyquant}:
$x_i = (1  \lambda) q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a}_1) + \lambda q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a}_2)$.
If the CQFs
are known on a~grid in the parameter space, we have to use
an appropriate interpolation technique (multilinear, multicubic, {\it etc})
in that space in order to assign
the weights to the CQFs at the nearby grid points. In general,
the interpolated CQFs are defined by
a weighted average
\begin{equation}
\label{eq:interpq}
q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a}) = \sum_{j=1}^m w_j q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a}_j),
\end{equation}
where the sum is performed over $m$ nearby parameter points,
weights $w_j$ are normalized by $\sum_{j=1}^m w_j = 1$, and their
exact values depend on the parameter grid chosen and the interpolation method
used.
Basically, it is the whole mapping from ${\bf y}$ into ${\bf x}$ which gets
interpolated in this method.
The CQF interpolation results look
very natural, but the process is rather CPUintensive:
for each ${\bf x}$ we need to solve $d$ onedimensional nonlinear equations
\begin{equation}
\label{eq:interpsolve}
x_i = q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a}), \ \ \ \ i = d, \ldots, 1
\end{equation}
in order to determine ${\bf y}$ of the interpolated mapping.
In this process, each call to evaluate $q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a})$
triggers $m$ calls to evaluate $q_i(y_i  x_{i+1}, \ldots, x_d, {\bf a}_j)$.
Depending on implementation details, each of these $m$ calls
may in turn trigger root finding in an equation
like~\ref{eq:invertcondcdf}.
Once ${\bf y}$ is found for the interpolated CQFs,
density is determined by numerical
evaluation of~\ref{eq:invderiv}
in which $q_i(y_i  \ldots)$ are given by~\ref{eq:interpq}.
In practice, finite steps of the parameter grids will cause
dependence of the interpolation results on the order in which
conditional quantiles are evaluated. If choosing some
particular order is considered undesirable and increased CPU
loads are acceptable, all $d!$ possible permutations
should be averaged.
In the NPStat package, multivariate densities whose mapping
from the multidimensional unit cube into the density support
region is implemented via CQFs return ``true'' when
their virtual function \cname{mappedByQuantiles} is called.
Userdeveloped implementations of
multivariate densities should follow this convention as well.
In particular, mapping of
the densities represented by the \cname{BinnedDensityND} class
(lookup tables on hyperrectangular
grids with multilinear interpolation) is performed by CQFs,
as well as mapping of all fully factorizable densities.
When the fidelity of the model is less critical or when the
correlation structure of the distribution is more stable w.r.t.
parameter changes than its location and scales, much faster
multivariate density interpolation can be performed by decomposing
the density into the copula and the marginals. The support
of the copula density $c({\bf y}({\bf x}){\bf a})$ defined
in Eq.~\ref{eq:copuladef}
is always $U_d$ and it does not depend on the
parameter ${\bf a}$. This suggests that the marginals and the copula can
be interpolated separately, using quantile function interpolation
for the marginals and the standard weighted average interpolation
for the copula.
The NPStat package uses the same data structure to perform both
CQFbased and copulabased interpolation of multivariate densities.
Multilinear interpolation is supported on a rectangular parameter
grid (not necessarily equidistant).
The distributions at the grid points are collected together
using the \cname{GridInterpolatedDistribution} template,
while methodspecific
differences are isolated inside a policy class which serves as
the \cname{GridInterpolatedDistribution} template parameter.
Currently, \cname{UnitMapInterpolationND} or
\cname{CopulaInterpolationND} classes can serve as such a~parameter
which results in CQFbased or copulabased interpolation, respectively.
+using the \cname{GridInterpolatedDistribution} class.
+Internally, the interpolation is performed by either
+\cname{UnitMapInterpolationND} or \cname{CopulaInterpolationND}
+class, depending on a switch. These classes perform
+CQFbased and copulabased interpolation, respectively.
\section{Nonparametric Density Estimation}
\label{sec:densityestimation}
The problem of estimating population probability density function
from a finite sample drawn from this population is ubiquitous
in data analysis practice. It is often the case that
there are no substantial reasons for choosing a particular parametric
density model but certain assumptions,
such as continuity
of the density together with some number of derivatives or absence
of narrow spatial features, can still be justified. There is
a~number of approaches by which assumptions of this kind can be introduced
into the statistical model. These approaches are collectively known
as ``nonparametric
density estimation'' methods~{\cite{ref:silverman, ref:izenmann, ref:kde}}. The NPStat
package provides an efficient implementation of one of
such methods, kernel density estimation (KDE), together with its extension
to polynomial density models and densities with bounded support.
This extension is called ``local orthogonal polynomial expansion'' (LOrPE).
\subsection{Kernel Density Estimation (KDE)}
\label{sec:kde}
Suppose, we have an i.i.d.~sample of measurements $x_i$, $i = 0, 1,
..., N1$ from a univariate probability density $p(x)$. The empirical
probability density function (EPDF) for this sample is defined by
\begin{equation}
\mbox{EPDF}(x) = \frac{1}{N} \sum_{i=0}^{N1} \delta (x  x_i),
\label{eq:edf}
\end{equation}
where $\delta(x)$ is the Dirac delta function. $\mbox{EPDF}(x)$
can itself be considered an estimate of $p(x)$. However, this
estimate can be substantially improved if some additional information
about $p(x)$ is available. For example, we can often assume that
$p(x)$ is continuous together with its first few derivatives or
that it can have at most a few modes. In such cases a convolution
of $\mbox{EPDF}(x)$ with a kernel function, $K(x)$, often provides
a much better estimate of the population density. $K(x)$ itself is
usually chosen to be a symmetric continuous density with a location
and scale parameter, so that the resulting estimate looks
like
\begin{equation}
\hat{p}\sub{KDE}(xh) = \frac{1}{N h} \sum_{i=0}^{N1} K \left(\frac{x  x_i}{h}\right).
\label{eq:kde}
\end{equation}
In the context of
density estimation, parameter $h$ is usually referred to as
``bandwidth''.
Use of the Gaussian distribution or one of the distributions from the
symmetric beta family as $K(x)$ is very common. In fact, it is so common
that beta family kernels have their own names in the density estimation
literature. These names are listed in Table~\ref{table:betakernels}.
\begin{table}[h!]
\caption{Onedimensional kernels from the symmetric beta family.
All these kernels look like
${\cal{N}} \, (1  x^2)^p \, I(1 < x < 1)$, where ${\cal{N}}$
is the appropriate normalization factor. In the
NPStat package, their formulae are implemented by the
\cname{SymmetricBeta1D} function.}
\label{table:betakernels}
\begin{center}
\noindent\begin{tabular}{cc} \hline
Kernel name & Power $p$ \\ \hline\hline
Uniform (also called ``boxcar'') & 0 \\ \hline
Epanechnikov & 1 \\ \hline
Biweight (also called ``quartic'') & 2 \\ \hline
Triweight & 3 \\ \hline
Quadweight & 4 \\ \hline
% Quintweight & 5 \\ \hline
\end{tabular}
\end{center}
\end{table}
In the limit $N \rightarrow \infty$ and with proper choice of $h$ so that
$h \rightarrow 0$ and $N h \rightarrow \infty$,
$\hat{p}\sub{KDE}(x)$ becomes a~consistent
estimate of $p(x)$ in terms of integrated squared error (ISE):
\begin{equation}
\mbox{ISE}(h) = \int_{\infty}^{\infty} (\hat{p}\sub{KDE}(xh)  p(x))^2 dx, \ \ \ \ \ \lim_{N \rightarrow \infty} \mbox{ISE}(h) = 0.
\end{equation}
For the method analysis purposes, it is useful to understand
what happens when we reconstruct known densities $p(x)$.
Here, another measure of distance between the true
density and its estimator becomes indispensable,
called ``mean integrated squared error'' (MISE):
\begin{equation}
\mbox{MISE}(h) = E(\mbox{ISE}(h)) = E \left( \int_{\infty}^{\infty} (\hat{p}\sub{KDE}(xh)  p(x))^2 dx \right),
\label{eq:mise}
\end{equation}
where $E(...)$ stands for the expectation value over samples of $N$ points
drawn from $p(x)$.
While other distance measures can be defined
(and may be more relevant for your problem),
$\mbox{MISE}(h)$ is usually
the easiest to analyze
mathematically\footnote{Note that both ISE and MISE are not dimensionless and,
therefore, not invariant under scaling transformations.
It is OK to compare different $\hat{p}\sub{KDE}(xh)$ with each other if
the underlying $p(x)$ is the same, but ISE or MISE comparison for different $p(x)$
requires certain care if meaningful results are to be obtained.}.
A typical goal of such an analysis consists in
finding the value of $h$ which minimizes $\mbox{MISE}(h)$ for
a~sample of given size, and a~significant amount of effort
has been devoted to such bandwidth optimization studies
(see, {\it e.g.,} Refs.~\cite{ref:bwopt, ref:loaderbw} for a review).
A large fraction of these studies employs a~simple MISE approximation valid for large
values of $N$ known as ``AMISE'' (asymptotic MISE). This approximation
includes just the two leading terms known as ``bias'' which increases with
increasing bandwidth and ``variance'' which decreases with increasing bandwidth,
so that the bandwidth optimization procedure is reduced to finding
the best biasvariance tradeoff.
For subsequent discussion, it will be useful to introduce the concept
of kernel order. Let's define the functional
\begin{equation}
\mu_{j}(f) = \int_{\infty}^{\infty} x^j f(x) dx
\label{eq:mufunct}
\end{equation}
which is the $j$th moment of $f(x)$ about 0.
Then it is said that the kernel $K(x)$ is of order $m$ if
\begin{equation}
\mu_0(K) = 1, \ \mu_j(K) = 0 \ \mbox{for} \ j = 1, ..., m1, \ \mbox{and} \ \mu_m(K) \ne 0.
\end{equation}
It can be shown that, for onedimensional
KDE, the rate of AMISE convergence to 0 is proportional
to $N^{\frac{2 m}{2 m + 1}}$ (see, for example, section~2.8
of~\cite{ref:kernsmooth}). Therefore, kernels with high values
of $m$ should be preferred for large samples. At the same time,
only in case $m = 2$ the kernels can be everywhere nonnegative
({\it i.e.,} bona fide densities).
When $m > 2$ (so called ``highorder'' kernels), in order to have
$\mu_2(K) = 0$, $K(x)$ must become negative somewhere.
Therefore, negative values of $\hat{p}\sub{KDE}(xh)$ also
become possible, and a mechanism for dealing with this problem
must be specified. In NPStat, this problem is normally taken care
of by setting negative values of $\hat{p}\sub{KDE}(xh)$ to 0
with subsequent renormalization of the density estimate so that its
integral is 1.
It is instructive to consider $p(x)$, $\mbox{EPDF}(x)$,
and $\hat{p}\sub{KDE}(xh)$ in the frequency
domain\footnote{The Fourier transform of a
probability density is called ``characteristic function'' of the distribution.}. There, the Fourier
transform of $K(x)$ acts as a lowpass filter which suppresses the sampling
noise present in $\mbox{EPDF}(x)$ so that the result, $\hat{p}\sub{KDE}(xh)$,
has a better match
to the spectrum of $p(x)$. By Parseval's identity, this
leads to the reduction of the ISE. Highorder kernels allow
for a sharper frequency cutoff in the filter. Fortunately, the precise shape
of $p(x)$ frequency spectrum
is relatively
unimportant, it is only important that its high frequency components
decay ``fast enough'' so that their suppression together with the
noise does not cause a significant distortion of $\hat{p}\sub{KDE}(xh)$
in comparison with $p(x)$. Powerful automatic bandwidth selection rules
can be derived from this type of analysis if certain
assumptions about the $p(x)$ spectrum decay rate
at high frequencies are satisfied~\cite{ref:chiu2, ref:chiumultivar}.
With a few modifications, the ideas described above can be translated
to multivariate settings. The kernel function becomes a multivariate
density, and the bandwidth parameter becomes, in general, the bandwidth
matrix (for example, the covariance matrix in the case of multivariate
Gaussian kernel).
The NPStat package calculates $\hat{p}\sub{KDE}(xh)$ on an equidistant
grid in one or more dimensions.
Initially, the data sample is histogrammed using a finely
binned histogram and then convoluted with a kernel using
Discrete Fast Fourier Transform (DFFT).
The number of histogram bins, $N_b$, should be selected
taking into account the following considerations:
\begin{thinlist}
\item The bins should be sufficiently small so that no ``interesting''
detail will be missed due to discretization of the density.
\item The bins should be sufficiently small so that the expected optimal
MISE is significantly larger than the ISE due to
density discretization. Detailed exposition of this
requirement can be found in Ref.~\cite{ref:discretizedkde}.
\item DFFT should be efficient for this number of bins. It is best to use
$N_b = 2^k$ bins in each dimension, where $k$ is a positive integer.
\end{thinlist}
The computational complexity of this method is
${\cal O}(N) + {\cal O}(N_b \ln N_b)$ which
is usually much better than the ${\cal O}(N \times N_b)$ complexity of
a ``naive'' KDE implementation. However, for large sample
dimensionalities $N_b$ can become very large which limits the
applicability of this technique. There are other computational
methods (not yet in NPStat) which can work efficiently for
highdimensional samples~\cite{ref:fastkde1, ref:fastkde2}.
The following NPStat functions and classes can be used to perform
KDE and to assist in bandwidth selection:
\cname{amiseOptimalBwGauss} (header file ``npstat/stat/amiseOptimalBandwidth.hh'')  calculates AMISEoptimal bandwidth for
fixedbandwidth KDE with Gaussian kernel as well as highorder kernels
derived from Gaussian. The following formula is implemented:
\begin{equation}
h\sub{AMISE} = \left( \frac{R(K) (m!)^2}{2 m \mu_{m}^2(K) R(p^{(m)}) N} \right)^{\frac{1}{2 m + 1}}.
\label{eq:amisebw}
\end{equation}
In this formula, $R(f)$ denotes the functional
$R(f) = \int_{\infty}^{\infty} f^2(x) dx$ defined
for any squareintegrable function $f$, $\mu_{m}(f)$
is the functional defined in Eq.~\ref{eq:mufunct},
$K$ is the kernel,
$m$ is the kernel order, $p^{(m)}$ is the $m$th derivative of the
reconstructed density\footnote{Of course, in
practice the reconstructed density is usually unknown.
It is expected that a~lookalike density will be substituted.
}, and $N$ is the number of points in the sample.
The expected AMISE corresponding to this bandwidth is calculated from
\begin{equation}
\mbox{AMISE}(h\sub{AMISE}) = \frac{(2 m + 1) R(K)}{2 m \, h\sub{AMISE} \,N}.
\label{eq:bestamise}
\end{equation}
For the origin and further discussion
of these formulae consult, for example, section~2.8 in
Ref.~\cite{ref:kernsmooth}. It is assumed that highorder kernels are
generated according to Eq.~\ref{eq:effk}.
\cname{amiseOptimalBwSymbeta} (header file ``npstat/stat/amiseOptimalBandwidth.hh'')
 calculates AMISEoptimal bandwidth
and corresponding expected AMISE according to
Eqs.~\ref{eq:amisebw} and~\ref{eq:bestamise}
for fixedbandwidth KDE with kernels from symmetric beta family as well
as with highorder kernels derived from symmetric beta distributions.
\cname{amisePluginBwGauss} (header file ``npstat/stat/amiseOptimalBandwidth.hh'')
 Gaussian $p(x)$ is substituted
in Eqs.~\ref{eq:amisebw} and~\ref{eq:bestamise} and the corresponding
quantities are found for the Gaussian kernel as well as highorder kernels
derived from Gaussian.
\cname{amisePluginBwSymbeta} (header file ``npstat/stat/amiseOptimalBandwidth.hh'')
 Gaussian $p(x)$ is substituted
in Eqs.~\ref{eq:amisebw} and~\ref{eq:bestamise} and the corresponding
quantities are found for the kernels from the symmetric beta family as well
as highorder kernels derived from symmetric beta distributions.
\cname{miseOptimalBw} and \cname{gaussianMISE} methods of the
\cname{GaussianMixture1D} class  these methods calculate
MISEoptimal bandwidth and the corresponding exact MISE for
Gaussian mixture densities estimated with the Gaussian kernel
(or highorder kernels derived from the Gaussian) according
to the formulae from Ref.~\cite{ref:bwgaussmix}.
\cname{ConstantBandwidthSmoother1D}  This class implements fixedbandwidth
KDE for onedimensional samples of points using Gaussian kernels
or kernels from the symmetric beta family (including highorder
kernels which are generated internally). Boundary effects can be
alleviated by data mirroring. Kernel convolutions are performed
by DFFT after sample discretization.
\cname{ConstantBandwidthSmootherND}  This class implements fixedbandwidth
KDE for multivariate histograms. Arbitrary density implementations
which inherit from \cname{AbsDistributionND} can be used as weight
functions for generating highorder kernels. Boundary effects can be
alleviated by data mirroring.
\cname{JohnsonKDESmoother}  This class constructs adaptive bandwidth KDE estimates for
onedimensional samples. It operates in several steps:
\begin{thinlist}
\item The sample is discretized using centroidpreserving binning.
\item Population mean, $\mu$, standard deviation, $\sigma$,
skewness, $s$, and kurtosis, $k$, are
estimated by the \cname{arrayShape1D} function.
\item A distribution from the Johnson system with these
values of $\mu$, $\sigma$, $s$, and $k$ is used as a template
for the \cname{amiseOptimalBwSymbeta} (or \cname{amiseOptimalBwGauss})
function which calculates optimal constant bandwidth
according to Eq.~\ref{eq:amisebw}.
\item A pilot density estimate is built by KDE
(using \cname{LocalPolyFilter1D} class) with the bandwidth determined
in the previous step.
\item The sample is smoothed by the \cname{variableBandwidthSmooth1D}
function (described below) using this pilot estimate.
\end{thinlist}
This method achieves better MISE
convergence rate without using highorder kernels (so that density
truncation below zero is unnecessary). Give this method a serious
consideration if you are working with onedimensional samples,
expect to reconstruct a unimodal distribution, and do not have to worry
about boundary effects.
\cname{KDECopulaSmoother}  multivariate KDE in which extra care
is applied in order to ensure that the estimation result
is a bona fide copula ({\it i.e.,} that all of its marginals
are uniform).
This class should normally be used to smooth empirical copula densities
constructed by \cname{empiricalCopulaHisto} or
\cname{empiricalCopulaDensity}.
The bandwidth can be supplied by the user or
it can be chosen by crossvalidation. In general,
\cname{LOrPECopulaSmoother} will produce better results
in this context, so \cname{KDECopulaSmoother} should
be used only in case the slower speed of \cname{LOrPECopulaSmoother}
is deemed unacceptable.
\cname{KDEFilterND}  A collection of \cname{KDEFilterND} objects which utilize
common workspace and DFFT engine can be employed to perform crossvalidation
calculations and bandwidth scans. Such a collection is used, for example,
by the \cname{KDECopulaSmoother} class described above.
If you already know the bandwidth,
the \cname{ConstantBandwidthSmootherND} class
will likely be more convenient to use than this one.
\cname{variableBandwidthSmooth1D}  implements onedimensional
variable kernel adaptive KDE (which should not be confused
with {\it local kernel}, or {\it balloon}, method  see section~2.10 in
Ref.~\cite{ref:kernsmooth} for further discussion). In general, this
approach consists in assigning different bandwidth values to each sample
point. In the \cname{variableBandwidthSmooth1D} function, this assignment
is performed according to the formula
\begin{equation}
h_i = \frac{c}{{\hat \rho}^{\alpha}(x_i)},
\end{equation}
where ${\hat \rho(x)}$ is a~pilot density estimate constructed, for example,
by fixedbandwidth KDE.
Boundary kernel adjustments are performed automatically.
The normalization constant $c$ is
determined so that the geometric mean
of $h_i$ equals to a~userprovided bandwidth. There are
certain reasons to believe that the choice of power
parameter $\alpha = 1/2$ is a~particularly good one~\cite{ref:abramson}.
\cname{simpleVariableBandwidthSmooth1D}  a highlevel driver function for
variable kernel adaptive KDE which uses kernels from the symmetric beta
family and automatically generates a~pilot density estimate employing
AMISE plugin bandwidth.
\subsection{Local Orthogonal Polynomial Expansion (LOrPE)}
\label{sec:lorpe}
Local Orthogonal Polynomial Expansion (LOrPE) can be viewed as
a convenient method
for creating kernels with desired properties (including highorder
kernels) and for eliminating the KDE boundary bias\footnote{If you
are familiar with orthogonal series density estimators (OSDE), you can
also view LOrPE as a~localized version of OSDE.}.
LOrPE amounts to constructing a truncated expansion of the EPDF
defined by Eq.~\ref{eq:edf}
into orthogonal polynomial series near each point $x\sub{fit}$
where we want to build the initial density estimate:
\begin{equation}
\label{eq:expansion}
\hat{f}\sub{LOrPE}(xh) = \sum_{k=0}^{M}c_{k}(x\sub{fit}, h) P_{k}\left(\frac{x  \xf}{h}\right).
\end{equation}
The polynomials $P_{k}(x)$ are built to satisfy the normalization condition
\begin{equation}\label{eq:norm0}
\frac{1}{h} \int_{a}^{b} P_{j}\left(\frac{x  \xf}{h}\right)P_{k}\left(\frac{x  \xf}{h}\right) K\left(\frac{x  \xf}{h}\right)dx = \delta_{jk},
\end{equation}
which is equivalent to
\begin{equation}
\label{eq:norm}
\int_{(a  x\sub{fit})/h}^{(b  x\sub{fit})/h} P_{j}(y)P_{k}(y)K(y)dy = \delta_{jk},
\end{equation}
where $\delta_{jk}$ is the Kronecker delta,
$K(x)$ is a suitably chosen kernel function,
and $[a, b]$ is the support interval of the estimated density.
For commonly used kernels from
the beta family (Epanechnikov, biweight, triweight, {\it etc}.), condition (\ref{eq:norm})
generates the normalized Gegenbauer polynomials (up to a common multiplicative
constant) at points $\xf$ sufficiently deep inside the support
interval, provided $h$ is small enough to guarantee that
$(a\xf)/h\leq 1$ and $(b\xf)/h\geq 1$.
If $\xf$ is
sufficiently close to the boundaries of the density
support $[a,b]$ then the polynomial system will vary
depending on $\xf$, and the notation
$P_{k}(\cdot,\xf)$ would be more appropriate. However, in the subsequent
text this dependence on $\xf$ will be suppressed in order to simplify
the notation.
The expansion coefficients $c_{k}(x\sub{fit}, h)$
are determined by the usual scalar product of the expanded function with
$P_{k}$:
\begin{equation}\label{eq:convol}
c_{k}(x\sub{fit}, h) = \frac{1}{h} \int \mbox{EPDF}(x) P_{k}\left( \frac{x  x\sub{fit}}{h} \right)K\left( \frac{x  x\sub{fit}}{h} \right)dx,
\end{equation}
which, after substituting $\mbox{EPDF}(x)$ from~\ref{eq:edf}, leads to
\begin{equation}\label{eq:empck}
c_{k}(x\sub{fit}, h) = \frac{1}{Nh}\sum_{i=0}^{N1}P_{k}\left( \frac{x_i  x\sub{fit}}{h} \right) K\left( \frac{x_i  x\sub{fit}}{h} \right).
\end{equation}
Note that the coefficients $c_{k}(x\sub{fit}, h)$ calculated according to
Eq.~\ref{eq:convol} can be naturally
interpreted as localized expectation values of orthogonal polynomials
$P_{k}\left( \frac{x  x\sub{fit}}{h} \right)$ w.r.t. probability
density $\mbox{EPDF}(x)$ in which localization weights are given by
$\frac{1}{h} K\left( \frac{x  x\sub{fit}}{h} \right)$.
The density estimate at $x\sub{fit}$ is defined by
\begin{equation}
\hat{p}\sub{LOrPE}(x\sub{fit}h) = \max\{0, \hat{f}\sub{LOrPE}(x\sub{fit}h)\}.
\label{eq:trunc}
\end{equation}
In general, LOrPE does not produce a bona fide density (in this respect it is similar to the
orthogonal series estimator), and after calculating
the density for all $x\sub{fit}$ values one has to perform the overall renormalization.
Equation~\ref{eq:expansion} can be usefully generalized as follows:
\begin{equation}\label{eq:lorpe}
\hat{f}\sub{LOrPE}(xh) = \sum_{k=0}^{\infty} g(k) c_{k}(x\sub{fit}, h) P_{k}\left( \frac{x  x\sub{fit}}{h} \right).
\end{equation}
Here, $g(k)$ is a ``taper function''. Normally, $g(0) = 1$ and
there is an integer $M$ such that $g(k) = 0$ for any $k > M$. The
taper function suppresses
high order terms gradually instead of using a
sharp cutoff at $M$.
When evaluated at points $x\sub{fit}$
which are sufficiently far away from the density support
boundaries and if $K(x)$ is an even function, Eq.~\ref{eq:lorpe}
is equivalent to a~kernel density estimator with the effective
kernel
\begin{equation}\label{eq:effk}
K\sub{eff}(x) = K(x) \sum_{j=0}^{\infty} g(2 j) P_{2 j}(0) P_{2 j}(x).
\end{equation}
Moreover, if $g(k)$ is a step function,
{\it i.e.,}~$g(k) = 1$ for all $k \le M$ and $g(k) = 0$ for all $k > M$,
it can be shown that the effective kernel is of order $M+1$ if $M$
is odd and $M+2$ if $M$ is
even~\cite{ref:placida}\footnote{For such kernels the sum in Eq.~\ref{eq:effk}
can be
reduced to a simple algebraic form via the ChristoffelDarboux identity.
However, the resulting formula is not easy to evaluate in a numerically
stable manner near $x = 0$.}.
A slightly different modification of LOrPE is based on the following
identity:
\begin{equation}\label{eq:deltasum}
\frac{1}{h} \sum_{j=0}^{\infty} P_{j}\left(\frac{x  x_i}{h}\right)P_{j}\left(0\right) K\left(\frac{x  x_i}{h}\right) = \delta(x  x_i).
\end{equation}
Substituting this into Eq.~\ref{eq:edf}, we obtain
\begin{equation}
\mbox{EPDF}(x) = \frac{1}{N h} \sum_{i=0}^{N1} \sum_{j=0}^{\infty} P_{j}\left(\frac{x  x_i}{h}\right)P_{j}\left(0\right) K\left(\frac{x  x_i}{h}\right).
\end{equation}
The modified density estimate is obtained from this expansion by introducing a taper:
\begin{equation}
\hat{\hat{f}}\sub{LOrPE}(x\sub{fit}h) = \frac{1}{N h} \sum_{i=0}^{N1} \sum_{j=0}^{\infty} g(j) P_{j}\left(\frac{x\sub{fit}  x_i}{h}\right)P_{j}\left(0\right) K\left(\frac{x\sub{fit}  x_i}{h}\right)
\label{eq:lorpe2}
\end{equation}
and then truncation is handled just like in Eq.~\ref{eq:trunc}.
For even kernels and points $x\sub{fit}$ sufficiently far away from the density support
boundaries, Eq.~\ref{eq:lorpe2} is equivalent to Eq.~\ref{eq:lorpe}
evaluated at $x = x\sub{fit}$. However, this is no longer true near the boundaries.
Perhaps, the easiest way to think about it is that, in Eq.~\ref{eq:lorpe2},
an effective kernel is placed at the location of each data point
(and, in general, shapes of these effective kernels
depend on the data point location $x_i$).
All these kernels are then summed to obtain the density estimates
at all $x\sub{fit}$.
On the other hand, in Eq.~\ref{eq:lorpe} the effective kernel is placed
at the location of each point at which the density estimate is made (so
that the effective kernel shape depends on $x\sub{fit}$).
This kernel generates the weights for each data point which are summed to
obtain the estimate. The difference between these two approaches
is usually ignored for KDE
(simple KDE is unable to deal with boundary bias anyway),
but for LOrPE substantially different results can be obtained near
the boundaries.
It is not obvious apriori which density estimate is better:
$\hat{\hat{f}}\sub{LOrPE}$ from Eq.~\ref{eq:lorpe2}
or $\hat{f}\sub{LOrPE}$ from Eq.~\ref{eq:lorpe},
although some preliminary experimentation with simple distributions
does indicate that
$\hat{f}\sub{LOrPE}$ typically results in smaller MISE.
The integral of the $\hat{\hat{f}}\sub{LOrPE}$ estimate
on the complete density support interval is automatically 1
which is not true for $\hat{f}\sub{LOrPE}$. On the other
hand, $\hat{f}\sub{LOrPE}$ admits an appealing interpretation
in terms of the local density expansion~\ref{eq:lorpe} in which
the localized expectation values of the orthogonal polynomials
$P_k$ are matched to their observed values
in the data sample (this also leads to automatic matching
of localized distribution moments about $x\sub{fit}$).
Onedimensional KDE with fixed kernel $K(x)$ has only one important parameter
which regulates the amount of smoothing:
bandwidth $h$. LOrPE has two such parameters: bandwidth $h$ and the highest
polynomial order $M$ (or, in general, the shape of the taper function).
It is intuitively obvious that polynomial modeling
should result in a smaller bias than KDE for densities
with several (at least $M$) continuous
derivatives, and that a~proper balance of $h$ and $M$ should
result in a better estimator overall.
LOrPE calculations remain essentially unchanged
in multivariate settings: the only difference is switching to multivariate
orthogonal polynomial systems.
Even though LOrPE is equivalent to KDE far away from the density support
boundaries, LOrPE does not suffer from the boundary bias because
Eq.~\ref{eq:norm} automatically adjusts the shape of
orthogonal polynomials near the boundary. This makes LOrPE
applicable in a~wider set of problems than KDE.
In addition to just making better estimates of densities with
a~sharp cutoff at the boundary, LOrPE fixes the main
problem with some existing KDEbased methods which are rarely
used in practice due to their severe degradation from
the boundary bias. Examples of such methods include transformation
kernel density estimation
(see, for example, section 2.10.3 of~\cite{ref:kernsmooth})
in which the transformation target is the uniform distribution,
as well as separate estimation
of the marginals and the copula for multivariate densities.
Unfortunately, LOrPE improvements over KDE do not come without
a price in terms of the algorithmic complexity of the method.
The density estimate can no longer be represented as a simple
convolution of the sample EPDF and a kernel. Because of this,
DFFTbased calculations are no longer sufficient.
In the LOrPE implementation within NPStat, simple algorithms
are used instead which perform pointwise convolutions. Their
computational complexity scales as ${\cal O}(N) + {\cal O}(N_b N_s)$, where
$N_b$ is the number of bins in the sample discretization
histogram and $N_s$ is the number of bins of the same length
(area, volume, {\it etc}) inside the kernel support. For
large bandwidth values (or for kernels with infinite support)
this essentially becomes ${\cal O}(N) + {\cal O}(N_b^2)$ which
can be significantly slower than the KDE implementation based on DFFT.
The following NPStat classes and functions can be used to perform LOrPE
and to assist in choosing the bandwidth:
\cname{LocalPolyFilter1D}  LOrPE for onedimensional samples.
A numerical GramSchmidt procedure is used
to build polynomials defined by Eq.~\ref{eq:norm0}
on an equidistant grid. A linear filter which combines
formulae~\ref{eq:convol} and~\ref{eq:lorpe} is then
constructed for each grid point $\xf$ from the density
support region (the same ``central'' filter is used for
all $\xf$ points far away from the support boundaries).
The \cname{filter} method of the class can then be used to
build density estimates defined by Eq.~\ref{eq:lorpe} with $x = \xf$
from sample histograms. Alternatively, the \cname{convolve} method
can be used to to make estimates according to Eq.~\ref{eq:lorpe2}.
If necessary, subsequent truncation of the reconstructed
densities below 0 together with renormalization should be performed
by the user.
\cname{WeightTableFilter1DBuilder}
(header file ``npstat/stat/WeightTableFilter1DBuilder.hh'') 
Helper class designed to work with \cname{LocalPolyFilter1D}.
This helper constructs linear filters out of arbitrary scanned weights
utilizing orthogonal polynomials. If it is necessary to introduce exclusion
regions into the data (for example, for the purpose of interpolating
background density from sidebands), constructor of this class is the place
where this can be done.
\cname{NonmodifyingFilter1DBuilder}
(header file ``npstat/stat/WeightTableFilter1DBuilder.hh'') 
Helper class designed to work with \cname{LocalPolyFilter1D}.
This filter does not change the data ({\it i.e.}, this filter is
represented by the unit matrix).
\cname{getBoundaryFilter1DBuilder}
(header file ``npstat/stat/AbsFilter1DBuilder.hh'')  This
function can be used to construct various filters that inherit
from \cname{AbsBoundaryFilter1DBuilder} class. These filters differ
by the kernel adjustments they perform near the density support boundaries.
Many filters of this kind are declared in the
``npstat/stat/Filter1DBuilders.hh'' header file.
\cname{PolyFilterCollection1D}  A collection of \cname{LocalPolyFilter1D}
objects which can be used, for example, in bandwidth scans or in
crossvalidation calculations.
\cname{LocalPolyFilterND}  similar to \cname{LocalPolyFilter1D} but
intended for estimating multivariate densities.
\cname{SequentialPolyFilterND}  similar to \cname{LocalPolyFilterND}
but each dimension is processed sequentially using 1d filtering.
Employs a collection of \cname{LocalPolyFilter1D} objects, one for
each dataset dimension.
\cname{LOrPECopulaSmoother}  multivariate LOrPE in which extra care
is applied in order to ensure that the estimation result
is a bona fide copula ({\it i.e.,} that all of its marginals
are uniform).
This class should normally be used to smooth empirical copula densities
constructed by \cname{empiricalCopulaHisto} or
\cname{empiricalCopulaDensity}.
The bandwidth can be supplied by the user or
it can be chosen by crossvalidation. Less reliable but faster
calculations of this type can be performed with the
\cname{KDECopulaSmoother} class described in Section~\ref{sec:kde}.
\cname{SequentialCopulaSmoother}  similar to \cname{LOrPECopulaSmoother}
but each dimension is processed sequentially using 1d filtering.
\cname{NonparametricCompositeBuilder}  a highlevel API for estimating
multivariate densities by applying KDE or LOrPE separately to
each marginal and to the copula. This class builds
\cname{CompositeDistributionND} objects from collections of sample points.
\cname{symbetaLOrPEFilter1D} (header file
``npstat/stat/LocalPolyFilter1D.hh'')  a convenience utility
for building onedimensional LOrPE filters using kernels from
the symmetric beta family (including the Gaussian).
\cname{lorpeMise1D} (header file ``npstat/stat/lorpeMise1D.hh'')
 calculates LOrPE MISE for an arbitrary known density
according to Eq.~\ref{eq:mise}. The support of the density is
split into $M$ subintervals. It is assumed that the sample
points are distributed in these subintervals according to the
multinomial distribution. The covariance matrix of this
distribution is then propagated to the density estimation
result. The trace of the propagated covariance, multiplied by the width of the subinterval,
is added to the integrated squared bias in order to obtain the MISE. Of course, this method
reproduces Eq.~\ref{eq:mise} exactly only in the limit $M \rightarrow \infty$,
while in practice the $O(M^3)$ computational complexity of the error propagation
formulae (based on conventional matrix multiplication) will necessitate a~reasonable
choice of finite~$M$. I suggest choosing $M$ in such a manner
that the ISE introduced by the discretization of the density is
significantly smaller than the estimated MISE.
+ Further remarks on theoretical development of LOrPE,
+ as well as a comparison of its performance with
+ other density estimation techniques,
+ can be found in~\cite{ref:lorpe}.
\subsection {Density Estimation with Bernstein Polynomials}
\label{sec:bernstein}
Density representation by Bernstein polynomial series
is an alternative
approach which can be used to alleviate the boundary bias problem of KDE.
Bernstein polynomials are defined as follows:
\begin{equation}
b_{m,n}(x) = C_n^m x^m (1  x)^{n  m},
\label{eq:bpoly}
\end{equation}
where $m = 0, 1, ..., n$ and $C_n^m$ are the binomial coefficients:
\begin{equation}
C_n^m = \frac{n!}{m! (nm)!}.
\end{equation}
In the density estimation context, Bernstein polynomials are often
generalized to noninteger values of $n$ and $m$ in which case
$C_n^m$ is replaced by
$\frac{\Gamma(n + 1)}{\Gamma(m + 1) \Gamma(n  m +1)}$. Up to
normalization, this representation is equivalent to the
beta distribution with parameters $\alpha = m + 1$ and $\beta = n  m + 1$.
For notational simplicity, I will use the term ``Bernstein polynomials''
even if $n$ and $m$ are not integer.
There are two substantially different variations of this density estimation
method. In the first scheme~\cite{ref:betakern},
Bernstein polynomials are used
as variableshape kernels in a KDElike formula:
\begin{equation}
\hat{f}\sub{B}(xn) = \frac{n + 1}{N} \sum_{i=0}^{N1} b_{m(x), n}(x_i),
\label{eq:betakernels}
\end{equation}
where it is assumed
that the reconstructed density is supported on the [0, 1] interval
(naturally, this interval can be shifted and scaled as needed).
The requirement of asymptotic estimator consistency does not
fix $m(x)$ uniquely, and this mapping can be chosen in a~variety
of ways. In NPStat, the following relationship is implemented:
\begin{equation}
m(x) = \left\{ \begin{array}{ll}
c & \mbox{ if } x (n  2 s) + s \leq c \\
x (n  2 s) + s & \mbox{ if } c < x (n  2 s) + s < n  c \\
n  c & \mbox{ if } x (n  2 s) + s \geq n  c
\end{array} \right.
\end{equation}
This formula reproduces the simple mapping $m(x) = x n$ considered in
Ref.~\cite{ref:betakern} in case $s = 0$ and $c = 0$. The offset parameter
$s$ plays the role of effective Bernstein polynomial degree used
at $x = 0$ and regulates the amount of boundary bias. Meaningful
values of $s$ lie between $1$ and 0. As $m \rightarrow 1$, the mean
of the generalized Bernstein polynomial kernels tends to $x$ so that
the estimator
becomes uniformly unbiased for linear density functions.
At the same time, the width of
the kernel tends to 0 at the boundary which leads to an increase in the estimator
variance. As it makes little sense to use kernels whose width
is smaller than the discretization bin size, the cutoff parameter $c$
was introduced. This cutoff effectively limits kernel width from below
in a manner which preserves asymptotic consistency of the estimator
(the useful range of $c$ values is also [1, 0]).
In addition to appropriate selection of the main
bandwidth parameter $n$, proper choice of parameters $s$ and $c$
can significantly improve estimator convergence at the boundary.
In the second variation of this density estimation technique~\cite{ref:babu},
the polynomials are chosen
based on the location of the observed points:
\begin{equation}
\hat{\hat{f}}\sub{B}(xn) = \frac{n + 1}{N} \sum_{i=0}^{N1} b_{m(x_i), n}(x).
\label{eq:bern2}
\end{equation}
For any $m$ and $n$,
$\int_0^1 b_{m,n}(x) dx = \frac{1}{n + 1}$, so this particular estimate
is a bona fide density. Due to the reasons that will be mentioned later
in this subsection,
it can be advantageous to keep integer $m$ and $n$ in this approach.
As in the case of $x$dependent kernel shape, there is
some amount of freedom in the $m(x_i)$ assignment. For asymptotic
consistency we must require that $m(x_i)/n \rightarrow x_i$ as
$n \rightarrow \infty$. However, such
assignments are not unique. One can choose, for example,
$m(x_i) = \lfloor x_i/(n + 1) \rfloor$ as in Ref.~\cite{ref:babu}
(the symbol $\lfloor \cdot \rfloor$ stands for the ``floor''
function), but it can also be useful to assign more than one polynomial
to $x_i$. The following scheme is implemented in NPStat in addition to
the $m(x_i)$ assignment just mentioned. First, an
integer $k$ is found such that $k + 0.5 \leq x_i (n + 1) < k + 1.5$.
Then, if $0 \leq k < n$,
\begin{equation}
m(x_i) = \left\{ \begin{array}{ll}
k & \mbox{ with weight } k + 1.5  x_i (n + 1) \\
k + 1 & \mbox{ with weight } x_i (n + 1)  k  0.5
\end{array} \right.
\end{equation}
If $k < 0$ then $m(x_i) = 0$ with weight 1, and if $k >= n$ then $m(x_i) = n$
with weight~1.
The polynomial weighting schemes actually implemented in the code are
slightly more complicated as they take into account data binning.
With integer values of $m$ and $n$, density estimates constructed according
to Eq.~\ref{eq:bern2} (or its weighted version just described) have an important
property of being positive doubly stochastic. What it means is that not only
$\int_0^1 \hat{\hat{f}}\sub{B}(xn) dx = 1$
but also a sum of an arbitrary number of separate $\hat{\hat{f}}\sub{B}(xn)$
estimators will be flat in $x$
as long as the sum of $x_i$ values used to build all these estimators
is itself flat. If the $x_i$ values are flat
between 0 and 1 then assigned $m$ values will be flat
between 0 and n (inclusive). Then double stochasticity
follows directly from the partition
of unity property of Bernstein polynomials: $\sum_{m=0}^n b_{m,n}(x) = \sum_{m=0}^n C_n^m x^m (1  x)^{n  m} = [x + (1  x)]^n = 1$.
A collection of positive double stochastic estimators can be used for copula
filtering by sequentially applying these estimators in each dimension (with the help
of \cname{SequentialPolyFilterND} class).
If the initial data set processed by this sequence represents a copula density
(for example, in case it is an empirical copula density histogram), the result
is also guaranteed to be a copula density.
In NPStat, any density estimator
implemented via the \cname{LocalPolyFilter1D} class
can be turned into closest (in some sense) doubly stochastic estimator by
calling the \cname{doublyStochasticFilter} method of that class.
Nonnegative estimators will be converted into nonnegative
doubly stochastic estimators
using an iterative procedure similar to the one described in~\cite{ref:sinkhorn}
while filters with negative entries will be converted into generalized doubly
stochastic filters according to the method described in~\cite{ref:khoury}.
The following facilities are provided by NPStat for estimating
densities with Bernstein polynomials and beta distribution kernels:
\cname{BetaFilter1DBuilder}  Constructs linear filters for
\cname{LocalPolyFilter1D} class according to Eq.~\ref{eq:betakernels}.
\cname{BernsteinFilter1DBuilder}  Constructs linear filters for
\cname{LocalPolyFilter1D} according to Eq.~\ref{eq:bern2}. These
filters are intended for use with the \cname{convolve} method
of \cname{LocalPolyFilter1D} class rather than the \cname{filter} method.
\cname{betaKernelsBandwidth}  This function estimates optimal
bandwidth (order $n$ of the generalized Bernstein polynomial)
for Eq.~\ref{eq:betakernels}
according to the AMISE calculation
presented in~\cite{ref:betakern}. This bandwidth estimate should not be taken
very seriously for realistic sample sizes, as finite sample performance
of Bernstein polynomial methods is not well understood.
It is worth mentioning that there are other ways to create
doubly stochastic filters. For example, the filter which
represents the discretized Green's function for the homogeneous
1d heat equation with the Neumann boundary conditions is doubly
stochastic. This particular filter can be constructed
with the help of the \cname{symbetaLOrPEFilter1D} function,
using Gaussian kernel and specifying 0 for the degree of the
LOrPE polynomial as well as \textsf{BM\_FOLD} for the boundary
handling option. Subsequently, the \cname{convolve} method of this filter
should be utilized to smooth the data.
\subsection {Using CrossValidation for Choosing the Bandwidth}
\label{sec:crossval}
Crossvalidation is a technique for adaptive bandwidth selection
applicable to both KDE and LOrPE. Two types of crossvalidation
are supported by NPStat: least squares and pseudolikelihood.
The least squares crossvalidation is based on the following idea.
The MISE from Eq.~\ref{eq:mise} can be written as
\begin{eqnarray*}
\mbox{MISE}(h) & = & E \left( \int_{\infty}^{\infty} (\hat{p}(xh)  p(x))^2 dx \right) \\ & = & E \left( \int_{\infty}^{\infty} \hat{p}^2(xh) dx \right)  2 E \left( \int_{\infty}^{\infty} \hat{p}(xh) p(x) dx \right) +
\int_{\infty}^{\infty} p^2(x) dx.
\label{eq:lsqcv}
\end{eqnarray*}
The last term, $\int_{\infty}^{\infty} p^2(x) dx$, does not depend on $h$,
so minimization of MISE is equivalent to minimization of $B(h) \equiv E \left( \int_{\infty}^{\infty} \hat{p}^2(xh) dx  2 \int_{\infty}^{\infty} \hat{p}(xh) p(x) dx \right).$ Of course, $p(x)$ itself is unknown. However, it can be
shown (as in section 3.4.3 of~\cite{ref:silverman})
that an unbiased estimator of $B(h)$ can be constructed as
\begin{equation}
\mbox{LSCV}(h) = \int_{\infty}^{\infty} \hat{p}^2(xh) dx  \frac{2}{N} \sum_{i=0}^{N1} \hat{p}_{1,i}(x_i, h),
\label{eq:lscv}
\end{equation}
where $\hat{p}_{1,i}(x, h)$ is a ``leavingoneout'' density estimator
to which the point at $x_i$ does not contribute. For example, in the
case of KDE this estimator is defined by
\begin{equation}
\hat{p}_{1,i}(xh) = \frac{1}{(N1) \, h} \sum_{\stackrel{j=0}{j \ne i}}^{N1} K \left(\frac{x  x_j}{h}\right).
\end{equation}
Minimization of $\mbox{LSCV}(h)$ can lead to a reasonable bandwidth estimate,
$h\sub{LSCV}$. However, as $h\sub{LSCV}$ is itself a random quantity,
its convergence towards the bandwidth that optimizes MISE, $h\sub{MISE}$,
is known to be rather slow. Moreover, $\mbox{LSCV}(h)$ can have
multiple minima, so its minimization is best carried out by simply
scanning $h$ within a certain range in the proximity of some value
$h^{*}$ suggested by plugin methods. For more information on the
issues related to the least squares crossvalidation see
Refs.~\cite{ref:silverman, ref:kernsmooth}.
The pseudolikelihood crossvalidation (also sometimes called
likelihood crossvalidation) is based on maximizing the ``leavingoneout''
likelihood:
\begin{equation}
\mbox{PLCV}(h) = \prod_{i=0}^{N1} \hat{p}_{1,i}(x_i, h)
\end{equation}
The criterion of maximum $\mbox{PLCV}(h)$
can be obtained by minimizing an~approximate
KullbackLeibler distance between the density and its
estimate~\cite{ref:silverman}.
Maximizing $\mbox{PLCV}(h)$ is only appropriate in certain
situations. In particular, whenever $\hat{p}_{1,i}(x_i, h)$
becomes 0 even for a~single point, this criterion fails
to produce meaningful results. Its use is also problematic
for densities with infinite support due to the strong
influence fluctuations in the distribution tails
exert on $\mbox{PLCV}(h)$. Because of these problems, NPStat
implements a ``regularized'' version of $\mbox{PLCV}(h)$
defined by
\begin{equation}
\mbox{RPLCV}(h) = \prod_{i=0}^{N1} \max \left(\hat{p}_{1,i}(x_i, h),\, \frac{\hat{p}\sub{self,$i$}(x_i, h)}{N^{\alpha}}\right),
\label{eq:rplcv}
\end{equation}
where $\alpha$ is the regularization parameter chosen by the user
($\alpha = 1/2$ usually works reasonably well) and
$\hat{p}\sub{self,$i$}(x, h)$ is the contribution of the data point at $x_i$
into the original density estimator. For KDE, this contribution is
\begin{equation}
\hat{p}\sub{self,$i$}(xh) = \frac{1}{N h} K \left(\frac{x  x_i}{h}\right).
\end{equation}
If the bandwidth is fixed, $\hat{p}\sub{self,$i$}(x_ih) = K(0)/(Nh)$
for every point $x_i$.
$\hat{p}\sub{self,$i$}(x_i, h)$ changes from one point to another for
LOrPE and for variablebandwidth KDE.
Crossvalidation in the NPStat package is implemented for discretized
KDE and LOrPE density estimators.
It is assumed that the optimal bandwidth corresponds
to the maximum of some quantity, as in the case of $\mbox{RPLCV}(h)$.
All classes which perform crossvalidation for univariate densities
inherit from the abstract base class \cname{AbsBandwidthCV1D}. For
multivariate densities, the corresponding base class is \cname{AbsBandwidthCVND}
(both of these base classes are declared in the header file
``npstat/stat/AbsBandwidthCV.hh''). The following concrete classes
can be used:
\cname{BandwidthCVLeastSquares1D}  implements Eq.~\ref{eq:lscv} for
KDE and LOrPE in 1d.
\cname{BandwidthCVLeastSquaresND}  implements Eq.~\ref{eq:lscv} for
multivariate KDE and LOrPE.
\cname{BandwidthCVPseudoLogli1D}  implements Eq.~\ref{eq:rplcv} for
KDE and LOrPE in 1d.
\cname{BandwidthCVPseudoLogliND}  implements Eq.~\ref{eq:rplcv} for
multivariate KDE and LOrPE.
\noindent The crossvalidation classes are used internally by such highlevel
classes as \cname{KDECopulaSmoother}, \cname{LOrPECopulaSmoother}, and
\cname{SequentialCopulaSmoother}.
\subsection {The Nearest Neighbor Method}
Using NPStat tools, a simple density estimation algorithm similar to
the $k$nearest neighbor method~\cite{ref:silverman}
can be implemented for onedimensional samples as follows:
\begin{thinlist}
\item Discretize the data using a finely binned histogram.
\item Convert this histogram into a distribution by constructing a \cname{BinnedDensity1D} object.
\item For any point $x$ at which a density estimate is desired,
calculate the corresponding cumulative distribution value $y = F(x)$.
\item For some interval $\Delta$, $0 < \Delta < 1$, estimate the density
at $x$ by
\begin{equation}
\hat{p}\sub{NN}(x) = \frac{\Delta}{q(y + \Delta/2)\,  \,q(y  \Delta/2)},
\end{equation}
where $q(y)$ is the quantile function: $q(F(x)) = x$.
This formula assumes $y  \Delta/2 \ge 0$ and $y + \Delta/2 \le 1$.
If $y + \Delta/2 > 1$ then the $\hat{p}\sub{NN}(x)$ denominator should
be replaced by $q(1)\,  \,q(1  \Delta)$, and if $y  \Delta/2 < 0$
then the denominator should become $q(\Delta)\,  \,q(0)$.
\end{thinlist}
In this approach, the parameter $\Delta$ plays the same role as the $k/N$ ratio
in the standard $k$nearest neighbor method.
For best results, $\Delta$ should scale with the number
of sample points as $N^{1/5}$,
and the optimal constant of proportionality depends on
the estimated density itself~\cite{ref:silverman}.
The $k$nearest neighbor method (and its modification just described)
is not recommended for estimating
densities with infinite support as it leads to a diverging
density integral.
For multivariate samples, a similar estimate can be constructed with
the help of \cname{HistoNDCdf} class. Its method \cname{coveringBox}
can be used to find the smallest $d$dimensional box with the given center
and fixed proportions which encloses the desired sample fraction.
\section{Nonparametric Regression}
\label{sec:npregression}
``Regression'' refers to a form of data analysis
in which the behavior of the dependent variable, called ``response'',
is deduced as a function of the independent variable, called ``predictor'',
from a sample of observations.
The response values are considered random ({\it e.g.}, contaminated by
noise) while the predictor values are usually assumed to be deterministic.
The analysis purpose is thus to determine the location parameter
of the response distribution
(mean, median, mode, {\it etc}) as a function of the predictor.
In the NPStat algorithms, the response is always assumed to be a
univariate quantity while the predictor can be either univariate or
multivariate. In the discussion below, predictor will be denoted
by ${\bf x}$, response by $y$, $\mu({\bf x})$ will be used
to describe the response location function, and $\hat{\mu}({\bf x})$
will denote an~estimator of $\mu({\bf x})$.
``Nonparametric regression'' refers to a form of regression analysis
in which no global parametric model is postulated for $\mu({\bf x})$.
Instead, for every ${\bf x}\sub{fit}$, $\mu({\bf x})$ is described
in the vicinity of ${\bf x}\sub{fit}$ by a relatively
simple model which is fitted using sample points located nearby.
Further discussion of $\mu({\bf x})$ estimation depends critically
on the assumptions which can be made about the distribution of
response values.
\subsection{Local Least Squares}
With the additional assumption of Gaussian error distribution ({\it i.e.,}
$y_i = \mu({\bf x}_i) + \epsilon_i$, where $\epsilon_i$ are normally distributed
with mean 0 and standard deviation $\sigma_i$), the
model fitting can be efficiently performed by the method of local
least squares. In this method, $\hat{\mu}({\bf x})$ is found by
minimizing the quantity:
\begin{equation}
\chi^2({\bf x}\sub{fit}, h) = \sum_{i=0}^{N1} \left(\frac{y_i  \hat{\mu}({\bf x}_i{\bf x}\sub{fit},h)}{\sigma_i}\right)^2 K_h({\bf x}_i  {\bf x}\sub{fit}).
\label{eq:localleastsq}
\end{equation}
Here, $h$ refers to one or more parameters which determine
the extent of the kernel $K_h({\bf x})$.
In NPStat, $\hat{\mu}({\bf x}{\bf x}\sub{fit},h)$ is usually decomposed
into orthogonal polynomials. In the case of univariate predictor,
\begin{equation}
\hat{\mu}(xx\sub{fit},h) = \sum_{k=0}^{M} \hat{a}_k(x\sub{fit},h) P_k\left(\frac{x  x\sub{fit}}{h}\right),
\label{eq:regmodel}
\end{equation}
where polynomials $P_k(x)$ are subject to normalization condition~\ref{eq:norm0}.
The expansion coefficients $\hat{a}_k(x\sub{fit},h)$ are determined from
the equations $\partial \chi^2(x\sub{fit},h)/ \partial \hat{a}_k = 0$
which leads to $M + 1$ simultaneous equations for $k = 0, 1, ..., M$:
\begin{equation}
\sum_{i=0}^{N1} \frac{1}{\sigma_i^2} \left( y_i  \sum_{j=0}^{M} \hat{a}_j(x\sub{fit},h) P_j\left(\frac{x_i  x\sub{fit}}{h}\right) \right) K\left(\frac{x_i  x\sub{fit}}{h}\right) P_k\left(\frac{x_i  x\sub{fit}}{h}\right) = 0.
\label{eq:regsystem}
\end{equation}
If the predictor values $x_i$ are specified on a regular grid of points
then the discretized version of Eq~~\ref{eq:norm0} is just
\begin{equation}
\frac{h_g}{h} \sum_{i=0}^{N1} P_j\left(\frac{x_i  x\sub{fit}}{h} \right) K\left(\frac{x_i  x\sub{fit}}{h}\right) P_k\left(\frac{x_i  x\sub{fit}}{h}\right) = \delta_{jk},
\end{equation}
where $h_g$ is the distance between any two adjacent values of $x_i$.
This leads to a particularly simple solution for $\hat{a}_k(x\sub{fit},h)$
if the model can be assumed at least locally homoscedastic
({\it i.e.,} if all $\sigma_i$ are the same in the vicinity of $x\sub{fit}$):
\begin{equation}
\hat{a}_k(x\sub{fit},h) = \frac{h_g}{h} \sum_{i=0}^{N1} y_i K\left(\frac{x_i  x\sub{fit}}{h}\right) P_k\left(\frac{x_i  x\sub{fit}}{h}\right).
\label{eq:regconvol}
\end{equation}
Substituting this into Eq.~\ref{eq:regmodel}, one gets
\begin{equation}
\hat{\mu}(x\sub{fit}  h) = \frac{h_g}{h} \sum_{i=0}^{N1} \sum_{k=0}^{M} y_i K\left(\frac{x_i  x\sub{fit}}{h}\right) P_k\left(\frac{x_i  x\sub{fit}}{h}\right) P_k(0).
\label{eq:effreg}
\end{equation}
If $K(x)$ is an even function and $x$ is far away from
the boundaries of the interval on which the regression is performed,
Eq.~\ref{eq:effreg} is equivalent to the wellknown NadarayaWatson estimator,
\begin{equation}
\hat{\mu}_{NW}(x\sub{fit}  h) = \frac{\sum_{i=0}^{N1} y_i K\sub{eff}\left(\frac{x_i  x\sub{fit}}{h}\right)}{\sum_{i=0}^{N1} K\sub{eff}\left(\frac{x_i  x\sub{fit}}{h}\right)},
\end{equation}
with the effective kernel given by
\begin{equation}
K\sub{eff}(x) = \sum_{k=0}^{M} P_{k}(0) P_{k}(x) K(x).
\end{equation}
Just as in the case of LOrPE, a taper function can be introduced
for this kernel which leads to Eq~\ref{eq:effk}.
If the predictor values $x_i$ are arbitrary or if the model can not
be considered homoscedastic even locally, there is no
simple formula which solves the linear system~\ref{eq:regsystem}.
In this case the equations must be solved numerically.
To perform this calculation, NPStat calls appropriate routines
from LAPACK~\cite{ref:lapack}.
% Modeling the response by orthogonal polynomial series
% (as opposed to monomial expansion) helps to prevent illconditioning
% of the system and significantly improves the numerical stability
% of the solutions.
Generalization of local least squares methods to multivariate predictors
is straightforward: one simply switches to multivariate kernels
and polynomial systems.
The following NPStat classes can be used to perform local least
squares regression of locally homoscedastic polynomial models
on regular grids:
\cname{LocalPolyFilter1D}, \cname{LocalPolyFilterND}, and \cname{SequentialPolyFilterND}  these
classes have already been mentioned in Section~\ref{sec:lorpe}.
It turns out that LOrPE of discretized data essentially amounts to
local least squares regression on histogram bin contents. To see that,
compare Eqs.~\ref{eq:regconvol} and~\ref{eq:convol}.
Up to an overall normalization
constant, Eq.~\ref{eq:regconvol} is just a discretized version of Eq.~\ref{eq:convol}.
In fact, all three ``PolyFilter'' classes actually perform local least squares regression
which, in the signal analysis terminology, is a linear filtering procedure. If the result
is to be treated as a density, it has to be truncated below zero and renormalized by the user.
\cname{QuadraticOrthoPolyND}  this class supports a finer interface to the
local least squares regression functionality on a grid than \cname{LocalPolyFilterND},
but only for polynomials up to second degree. In addition to the response
itself, this class can be used to calculate the gradient
and the hessian of the local response surface defined by
Eq.~\ref{eq:regmodel}\footnote{This can be useful, for example,
for summarizing properties of loglikelihoods defined on grids
in the space of estimated parameters for some parametric
statistical models.}.
The predictor/response data can be
provided by a method of some class (callback)
which is sometimes more convenient
than using just a~grid of points.
\cname{LocalQuadraticLeastSquaresND}  this class fits local
least squares regression models for arbitrary multivariate predictor values
(no longer required to be on a grid).
The models can be heteroscedastic, as in the most general case of Eq.~\ref{eq:regsystem}.
The polynomials can be at most quadratic. Calculation of the gradient
and the hessian of the local response surface is supported.
\subsection{Local Logistic Regression}
\label{sec:locallog}
For regressing binary response variables,
NPStat implements a method known as ``local quadratic logistic regression''
(LQLR). This method is a trivial extension of the local linear
logistic regression originally proposed in~\cite{ref:localreg}.
In this type of analysis, ``response'' is the probability of
success in a Bernoulli trial, estimated as a function of one or more
predictors. Due to the manner in which it is often used, this probability
will be called ``efficiency'' for the remainder of this section.
In the LQLR model, the efficiency dependence on ${\bf x}$ is represented by
\begin{equation}
\epsilon({\bf x}) = \frac{1}{1 + e^{P({\bf x})}}
\end{equation}
where $P({\bf x})$ is a multivariate quadratic polynomial whose
coefficients are determined at each predictor value ${\bf x}\sub{fit}$
by maximizing the local loglikelihood:
\begin{equation}
L({\bf x}\sub{fit}, h) = \sum_{i=0}^{N1} K_h({\bf x}_{i}  {\bf x}\sub{fit}) \left[y_i \ln (\hat{\epsilon}({\bf x}_{i})) + (1  y_i) \ln (1  \hat{\epsilon}({\bf x}_{i})) \right].
\label{eq:lqlrlogli}
\end{equation}
Here,
$K_h({\bf x}_{i}  {\bf x}\sub{fit})$ is a~suitable localizing kernel which decays to 0
when ${\bf x}_{i}$ is far from ${\bf x}\sub{fit}$, and $y_i$ are the observed values
of the Bernoulli trial: 1 if the point ``passes'' and 0 if it ``fails''.
The local loglikelihood~\ref{eq:lqlrlogli} is very similar to the one
implemented in the Locfit package~\cite{ref:locfit}, the only difference
is that orthogonal polynomials are used in NPStat to construct
$P({\bf x})$ series instead of monomials.
Setting partial derivatives of $L({\bf x}\sub{fit}, h)$ with respect to
polynomial coefficients to 0 results in a system of nonlinear
equations for these coefficients. Solving such a system of equations
does not appear to be any easier than dealing with the original loglikelihood
optimization problem by applying, let say, the LevenbergMarquardt
algorithm~\cite{ref:lvm}.
NPStat includes facilities for efficient calculation of the LQLR
loglikelihood together with
its gradient with respect to $P({\bf x})$ expansion coefficients.
This code does not rely on any specific optimization solver,
and can be easily interfaced to a~number of different external
optimization tools. The relevant classes are:
\cname{LogisticRegressionOnKDTree} (header file ``npstat/stat/LocalLogisticRegression.hh'')  this class calculates the loglikelihood
from Eq.~\ref{eq:lqlrlogli} in the assumption that the kernel $K_h({\bf x})$
has finite support. In this case iterating over all $N$ points in the sample
becomes rather inefficient: there is no reason to cycle over values of
${\bf x}_{i}$ far away from ${\bf x}\sub{fit}$ because
$K_h({\bf x}_{i}  {\bf x}\sub{fit})$ is identically zero for all such points.
To automatically restrict the iterated range, the predictor values
are arranged into a spacepartitioning data structure known
as $k$$d$~tree~\cite{ref:kdtree} which is implemented in NPStat with the
\cname{KDTree} class (header file ``npstat/nm/KDTree.hh'').
\cname{LogisticRegressionOnGrid} (header file ``npstat/stat/LocalLogisticRegression.hh'')  this class calculates the loglikelihood
from Eq.~\ref{eq:lqlrlogli} in the assumption that the predictor values
are histogrammed. Two identically binned histograms must be available: the one
with all values of $y_i$ (``the denominator'') and the one which collects
only those ${\bf x}_{i}$ for which $y_i = 1$ (``the numerator''). Naturally,
only the bins sufficiently close to ${\bf x}\sub{fit}$ are
processed when the LQLR loglikelihood is evaluated for these histograms.
The ``interfaces'' directory of the NPStat package includes two
highlevel driver functions for fitting LQLR response
surfaces (header file
``npstat/interfaces/minuitLocalRegression.hh'').
These functions employ a generalpurpose
optimization package Minuit~\cite{ref:minuit} for maximizing
the loglikelihood. The names of the functions are
\cname{minuitUnbinnedLogisticRegression}
(intended for use with \cname{LogisticRegressionOnKDTree})
and \cname{minuitLogisticRegressionOnGrid}
(for use with \cname{LogisticRegressionOnGrid}).
To use these functions, the Minuit
package has to be compiled and linked together
with the user code which provides the data
and calls the functions.
\subsection{Local Quantile Regression}
\label{sec:localqreg}
The method of least squares allows us to solve the
problem of response mean determination in the regression
context. By recasting calculation of the sample
mean as a minimization problem, we have gained the
ability to condition the mean
on the value of the predictor. In a~similar manner,
the method of least absolute deviations can be used
to determine conditional median.
In the method of least squares,
the expression $S(f) = \sum_{i} f(y_i  \hat{\mu}({\bf x_i}))$
is minimized, with $f(t) = t^2$. The method of least absolute
deviations differs only by setting $f(t) = t$.
Moreover, just like the problem of determination
of response mean can be localized
by introducing a kernel in the predictor space
(resulting in local least squares, as in Eq.~\ref{eq:localleastsq}),
the problem of response median determination
can be subjected to the same localization treatment.
As a method of determination of response location,
local median regression is extremely robust (insensitive to outliers).
Not only the median but an arbitrary
distribution quantile can also be determined in this manner.
The corresponding function to use is
\begin{equation}
f_q(t) = q \,t \,I(t \ge 0)  (1  q) \,t \,I(t < 0),
\end{equation}
where $q$ is the cumulative distribution value of interest, $0 < q < 1$.
You can easily convince yourself of the validity of this statement
as follows. For a sample of points ${y_0, ..., y_{N1}}$,
define $t = y  y_q$, where $y_q$ is a parameter on which $S(f_q)$ depends.
The condition for the minimum of $S(f_q)$, $\frac{d S(f_q)}{d y_q} = 0$,
is then equivalent to $\frac{d S(f_q)}{d t} = 0$.
As $\frac{d f_q(t)}{d t} = q \,I(t > 0)  (1  q) \,I(t < 0)$, the minimum
of $S(f_q)$ is reached when
\begin{equation}
q \sum_{i=0}^{N1} I(y_i > y_q)  (1  q) \sum_{i=0}^{N1} I(y_i < y_q) = 0.
\label{eq:qmin}
\end{equation}
This equation is solved when the number of sample points for which
$y_i > y_q$ is $(1  q) N$ and the number of sample points for which
$y_i < y_q$ is $q N$. But this is precisely the definition of
the sample quantile which, in the limit $N \rightarrow \infty$, becomes
the population quantile of interest.
Unfortunately, solving Eq.~\ref{eq:qmin} numerically
in the regression context is usually significantly more
challenging than solving the corresponding $\chi^2$ minimization
problem. For realistic finite samples, $\frac{d S(f_q)}{d y_q}$ is not
a continuous function, and Eq.~\ref{eq:qmin} can have multiple solutions.
This basically rules out the use of standard gradientbased methods
for $S(f_q)$ minimization.
The following NPStat classes facilitate the solution of the local
quantile regression problem:
\cname{QuantileRegression1D}  calculates the expression to
be minimized (also called the ``loss function'') for a single univariate
predictor value $x\sub{fit}$. This expression looks as follows:
\begin{equation}
S\sub{LQR}(x\sub{fit}) = \sum_{i=0}^{N1} f_q(y_i  \hat{y}_q(x_ix\sub{fit})) w_i,
\end{equation}
where weights $w_i$ are provided by the user (for example, these
weights can be calculated as $w_i = K((x_i  x\sub{fit})/h)$, but more sophisticated
weighting schemes can be applied as well). The quantile dependence on the
predictor is modeled by
\begin{equation}
\hat{y}_q(xx\sub{fit}) = \sum_{k=0}^{M} \hat{a}_k(x\sub{fit}) P_k\left(\frac{x  x\sub{fit}}{h}\right),
\label{eq:quantpred}
\end{equation}
where $P_k(x)$ are either Legendre or Gegenbauer polynomials. The use of Legendre
polynomials is appropriate for a global fit of the regression curve, while
Gegenbauer polynomials are intended for use in combination with symmetric beta
kernels that generate localizing weights~$w_i$. The expansion coefficients
$\hat{a}_k(x\sub{fit})$ are to be determined by minimizing $S\sub{LQR}(x\sub{fit})$.
\cname{QuantileRegressionOnKDTree} (header file ``npstat/stat/LocalQuantileRegression.hh'')
 calculates the quantile regression
loss function for a multivariate predictor (the class method
which returns it is called \cname{linearLoss}):
\begin{equation}
S\sub{LQR}({\bf x}\sub{fit}, h) = \sum_{i=0}^{N1} f_q(y_i  \hat{y}_q({\bf x}_i{\bf x}\sub{fit})) K_h({\bf x}_i  {\bf x}\sub{fit}).
\label{eq:quantregnd}
\end{equation}
The quantile dependence on the predictor is modeled by an expansion similar to
Eq.~\ref{eq:quantpred} in which multivariate orthogonal polynomials
(up to second order) are generated by $K_h({\bf x})$ used as the weight
function. The expansion coefficients are to be determined for each
${\bf x}\sub{fit}$ separately by minimizing
$S\sub{LQR}({\bf x}\sub{fit}, h)$. The predictor values are arranged into
a~$k$$d$~tree structure for reasons similar to those mentioned when
the \cname{LogisticRegressionOnKDTree} class was described.
\cname{QuantileRegressionOnHisto} (header file ``npstat/stat/LocalQuantileRegression.hh'')
 calculates loss function~\ref{eq:quantregnd}
in the assumption that the response is histogrammed on a regular
grid of predictor values (so that the input histogram dimensionality is larger
by one than the dimensionality of the predictor variable).
\cname{CensoredQuantileRegressionOnKDTree} (header ``npstat/stat/ensoredQuantileRegression.hh'')
 this class constructs an
appropriate quantile regression loss function in case some of the response
values are unavailable due to censoring ({\it i.e.,} there is a cutoff
from above or from below on the response values).
For example, this situation can occur in modeling of jet response of a
particle detector when jets with visible energy below certain cutoff
are not reconstructed due to limitations in the clustering procedure.
It is assumed that the censoring efficiency ({\it i.e.,} the fraction
of points not affected by the cutoff) can be determined as a~function
of the predictor by other techniques, like the local logistic regression
described in section~\ref{sec:locallog}. It is also assumed
that the cutoff value is known as a~function of the predictor, and that
the presence of this cutoff is the only reason for inefficiency.
The appropriate loss function in this case is
\begin{equation}
S\sub{CLQR}({\bf x}\sub{fit}, h) = \sum_{i=0}^{N\sub{p}1} \left[f_q(y_i  \hat{y}_q({\bf x}_i{\bf x}\sub{fit})) + \frac{1  \epsilon_i}{\epsilon_i} g_q(\epsilon_i, y_{\mbox{\scriptsize cut},i}  \hat{y}_q({\bf x}_i{\bf x}\sub{fit}))\right] K_h({\bf x}_i  {\bf x}\sub{fit}),
\label{eq:clqr}
\end{equation}
where the summation is performed only over the $N\sub{p}$ points
in the sample surviving the cutoff. $\epsilon_i$ is the censoring
efficiency for the given ${\bf x}_i$. The function
$g_q(\epsilon, t)$ is defined differently for rightcensored (RC) samples
in which surviving points are below the cutoff and leftcensored (LC) samples
in which surviving points are above the cutoff:
\begin{equation}
g_{q,\mbox{\scriptsize RC}}(\epsilon, t) = I(q < \epsilon) f_q(t) + I(q \ge \epsilon) \left(\frac{q  \epsilon}{1  \epsilon} f_q(t) + \frac{1  q}{1  \epsilon} f_q(+\infty) \right)
\label{eq:clqrrc}
\end{equation}
\begin{equation}
g_{q,\mbox{\scriptsize LC}}(\epsilon, t) = I(q > 1  \epsilon) f_q(t) + I(q \le 1  \epsilon) \left( \frac{1  \epsilon  q}{1  \epsilon} f_q(t) + \frac{q}{1  \epsilon} f_q(\infty) \right)
\label{eq:clqrlc}
\end{equation}
Formulae~\ref{eq:clqr}, \ref{eq:clqrrc}, and~\ref{eq:clqrlc} were inspired
by Ref.~\cite{ref:localquantreg}. They can be understood as follows. Consider, for example,
a rightcensored sample. If this sample was not censored, the value of the
response cumulative distribution function at $y_{\mbox{\scriptsize cut},i}$
would be equal $\epsilon_i$.
For each point with response $y_i$ below the cutoff,
there are $(1  \epsilon_i)/\epsilon_i$ unobserved points above the cutoff (and $1/\epsilon_i$ points total).
Before localization, points below
the cutoff contribute the usual amount $f_q(y_i  \hat{y}_q({\bf x}_i{\bf x}\sub{fit}))$
into $S\sub{CLQR}({\bf x}\sub{fit}, h)$ (compare with
Eq.~\ref{eq:quantregnd}). To the points above cutoff, we assign
the value of response which equals either
$y_{\mbox{\scriptsize cut},i}$ or $+\infty$, in such a manner that the estimate
$\hat{y}_q({\bf x}_i{\bf x}\sub{fit})$ is ``pushed'' in the
right direction and by the right amount when it crosses $y_{\mbox{\scriptsize cut},i}$.
If the estimated quantile is less than the efficiency, all unobserved points
are assigned the response value $y_{\mbox{\scriptsize cut},i}$
(this corresponds to the term $I(q < \epsilon) f_q(t)$ in the Eq.~\ref{eq:clqrrc}). This is
because we know that the correct value of $\hat{y}_q({\bf x}_i{\bf x}\sub{fit})$
should be below $y_{\mbox{\scriptsize cut},i}$, so the penalty for placing
$\hat{y}_q({\bf x}_i{\bf x}\sub{fit})$ above the cutoff is generated
by all points, including the unobserved ones. If the chosen quantile is
larger than the efficiency, the only thing we know is that the correct value of
$\hat{y}_q({\bf x}_i{\bf x}\sub{fit})$ should be inside the interval
$(y_{\mbox{\scriptsize cut},i}, +\infty)$. There is no reason to prefer
any particular value from this interval, so the overall contribution of
sample point $i$ for which $\hat{y}_q({\bf x}_i{\bf x}\sub{fit}) \in (y_{\mbox{\scriptsize cut},i}, +\infty)$
into $S\sub{CLQR}({\bf x}\sub{fit}, h)$ must not depend on
$\hat{y}_q({\bf x}_i{\bf x}\sub{fit})$\footnote{If most points ${\bf x}_i$
are like that for some ${\bf x}\sub{fit}$ value, the fit becomes
unreliable. Consider increasing the bandwidth of the localization
kernel or avoid such situations altogether. You should not expect
to obtain good results for high (low) $q$ values and all possible
${\bf x}\sub{fit}$ in a right (left)censored sample.}
We do, however, want to prevent $\hat{y}_q({\bf x}_i{\bf x}\sub{fit})$ from leaving
this interval. This is achieved by placing so many points at $y_{\mbox{\scriptsize cut},i}$
that the fraction of sample points (including both observed and unobserved ones)
at or below $y_{\mbox{\scriptsize cut},i}$ is exactly $q$, and this is precisely
what the term proportional to $I(q \ge \epsilon)$ does in Eq.~\ref{eq:clqrrc}.
Similar reasoning applied to a leftcensored sample leads to Eq.~\ref{eq:clqrlc}.
Naturally, in the computer program, $\infty$ and $+\infty$ in Eqs.~\ref{eq:clqrrc} and~\ref{eq:clqrlc}
should be replaced by suitable userprovided numbers which are known to be below and above,
respectively, all possible response values. In order to avoid deterioration
in $S\sub{CLQR}({\bf x}\sub{fit}, h)$ numerical precision, these numbers should not be
very different from the minimum and maximum observed response.
\cname{CensoredQuantileRegressionOnHisto} (header ``npstat/stat/ensoredQuantileRegression.hh'')
 this class calculates
the loss function~\ref{eq:clqr}
in the assumption that all information about response, efficiency, and cutoffs is
provided on a~regular grid in the space of predictor values.
The ``interfaces'' directory of the NPStat package includes several highlevel
driver functions for performing local quantile regression. These driver functions
use the simplex minimization method of the Minuit package to perform local fitting of the
quantile curve expansion coefficients.
\cname{minuitLocalQuantileRegression1D} function uses \cname{QuantileRegression1D}
class internally to perform local quantile regression with onedimensional predictors.
This driver function can be used, for example, to construct Neyman belts from
numerical simulations of some statistical estimator.
A similar function, \cname{weightedLocalQuantileRegression1D}, can be used
to perform local quantile regression with onedimensional predictors when
the points are weighted.
The \cname{minuitQuantileRegression} driver function can use one
of the \cname{QuantileRegressionOnKDTree}, \cname{QuantileRegressionOnHisto},
\cname{CensoredQuantileRegressionOnKDTree}, or
\cname{CensoredQuantileRegressionOnHisto} classes
(all of which inherit from a common base)
to perform local quantile regression with multivariate predictors.
The function \cname{minuitQuantileRegressionIncrBW} has similar
functionality but it can also automatically increase the localization kernel
bandwidth so that not less than a certain predefined fraction of the whole
sample participates in the the local quantile determination
for each ${\bf x}\sub{fit}$.
\subsection{Iterative Local Least Trimmed Squares}
The NPStat package includes an implementation of an iterative
local least trimmed
squares (ILLTS) algorithm applicable when predictor/response
values are supplied on a regular grid of points in a multivariate
predictor space (think image denoising). The algorithm operation
consists of the following steps:
\begin{enumerate}
\item Systems of orthogonal polynomials up to userdefined degree $M$
are constructed using weight functions $K_{h,j}({\bf x})$.
These weight functions are defined using symmetric
finite support kernels in which the point in the kernel
center together with $j  1$ other points are set to 0.
Imagine, for example, a $5 \times 5$ grid in two dimensions.
Start with the uniform $K_h({\bf x})$ kernel which is 1 at every
grid point. The $K_{h,1}({\bf x})$ weight function
is produced from $K_h({\bf x})$ by setting the central grid
cell to 0. 24 weight functions of type $K_{h,2}({\bf x})$
are produced from $K_{h,1}({\bf x})$ by setting one other
grid cell to 0, in addition to the central one. 276
weight functions of type $K_{h,3}({\bf x})$ are produced
from $K_{h,1}({\bf x})$ by choosing 2 out of 24 remaining
cells which are set to 0, and so on.
\item Local least squares regression is performed using all
polynomial systems generated by $K_{h,j}({\bf x})$ weights
for all possible positions of the kernel inside the predictor
grid (sliding window)\footnote{Edge effects are taking into
account by constructing special polynomial systems
which use boundary kernels similar to
$K_{h,j}({\bf x})$ but with different placement of zeros.}.
For each kernel position,
we find the polynomial system which produces the best
$\chi^2$ calculated over grid points for
which the weight function is not 0. For this polynomial
system, we determine the value
$\Delta = y_c  y_{c,\mbox{\scriptsize fit}}$ for the
kernel center, where $y_c$ is the response value in
the data sample and $y_{c,\mbox{\scriptsize fit}}$ is
the response value produced by the local least squares fit.
\item The position of the kernel is found for which
$\Delta$ is the largest in the whole data sample.
\item The response for the position found
in the previous step is adjusted by setting it
to the value fitted at the kernel center.
\item $\Delta$ is recalculated for all kernel positions affected by
the adjustment performed in the previous step.
\item The previous three steps are repeated until some stopping
criterion is satisfied. For example, the requirement that
the largest $\Delta$ in the grid becomes small (below certain
cutoff) can serve as such a stopping criterion.
\end{enumerate}
The ILLTS algorithm works best if the fraction of outliers in the
sample is relatively small and the response errors are homoscedastic.
ILLTS is expected to be more efficient (in the statistical sense) than the local
quantile regression. If the spectrum of response errors is not known
in advance, it becomes very instructive to plot the history
of $\Delta$ values for which response adjustments were performed.
This plot often exhibits two characteristic ``knees'' which
correspond to the suppression of outliers and suppression of ``normal''
response noise. The procedure can be stopped somewhere
between these knees and followed up by normal local
least squares on the adjusted sample, perhaps utilizing
different bandwidth.
Unfortunately, computational complexity of the ILLTS
algorithm is increasing exponentially
with increasing $j$, so only very small values of $j$ are
practical. The inability to use high values of $j$ can
be partially compensated for by choosing smaller bandwidth
values and performing more iteration cycles. More cycles
result in larger {\it effective} bandwidth  think, for example,
what happens when you perform local least squares multiple
times. For certain kernels like Gaussian or Cauchy
({\it i.e.,} stable distributions) and polynomial degree $M = 0$ (local
constant fit),
this is exactly equivalent to choosing larger bandwidth.
For other types of kernels not only the effective bandwidth increases
with the number of passes
but also the effective kernel shape gets modified\footnote{ILLTS adds
the dimension of time (adjustment cycle number) to the
solution of robust regression problem. It would be interesting to explore
this, for example, by introducing timedependent bandwidth or
by studying the connection between the ILLTS updating scheme and
the numerous updating schemes employed in solving
partial differential equations.}.
The toplevel API function for running the ILLTS algorithm is called
\cname{griddedRobustRegression}. The $K_{h,1}({\bf x})$ weights
can be used by supplying an object of \cname{WeightedLTSLoss} type
as its loss calculator argument, and $K_{h,2}({\bf x})$ weights
are used by choosing \cname{TwoPointsLTSLoss} instead. A simple
stopping criterion based on the $\Delta$ value, local least trimmed
squares $\chi^2$, or the number of adjustment cycles can be specified
with an object of \cname{GriddedRobustRegressionStop} type.
The \cname{griddedRobustRegression} implementation
is rather general, and can accept userdeveloped loss calculators and
stopping criteria.
\subsection{Organizing Regression Results}
It is usually desirable to calculate the regression
surface on a reasonably fine predictor grid and save the
result of this calculation for subsequent
fast lookup. A general inerface for such a lookup is provided
by the \cname{AbsMultivariateFunctor} class
(header file ``npstat/nm/AbsMultivariateFunctor.hh''). This class
can be used to represent results of both parametric and
nonparametric fits.
Persistent classes \cname{StorableInterpolationFunctor} and
\cname{StorableHistoNDFunctor} derived from \cname{AbsMultivariateFunctor}
are designed to represent
nonparametric regression results. Both of these classes
assume that the regression was
performed on a (hyper)rectangular grid of predictor
points. \cname{StorableInterpolationFunctor}
supports multilinear interpolation and extrapolation of results,
with flexible extrapolation (constant or linear) for each
dimension beyond the grid boundaries. This class essentially
combines the \cname{AbsMultivariateFunctor} interface with
the functionality of the \cname{LinInterpolatedTableND} class
discussed in more detail in Section~\ref{sec:utilities}.
The class \cname{StorableHistoNDFunctor} allows for constant,
multilinear, and multicubic interpolation\footnote{Multicubic
interpolation is supported for uniform grids only.}
inside the grid boundaries, and for constant extrapolation outside
the boundaries (the extrapolated response value is set to its value at
the closest boundary point). Both \cname{StorableInterpolationFunctor}
and \cname{StorableHistoNDFunctor} support arbitrary transformations
of the response variable via a userprovided functor. If a
transformation was initially applied to the
response values in order to simplify subsequent modeling,
this is a good place to perform the inverse.
\section{Unfolding with Smoothing}
\label{sec:unfolding}
In particle physics, the term {\it unfolding} is used to describe
methods of nonparametric reconstruction of probability densities
using observations affected by particle detector resolution and inefficiency
(see~\cite{ref:unfoldingrev} for a contemporary review).
In other natural sciences, the term {\it inverse problem} is commonly
used~\cite{ref:numrecipes}, while in the statistical literature
a more specific name, {\it deconvolution density estimate}, is
becoming the norm~\cite{ref:deconvolutionbook}.
\subsection{Unfolding Problem}
For the purpose of stating the unfolding problem,
it will be assumed that the detector can be described by an
operator $K$. This operator (also called {\it kernel},
{\it transfer function}, {\it observation function},
or {\it response function}, depending on the author and context)
converts probability
densities $p({\bf x})$
in the physical process space ${\bf x}$ into the densities $q({\bf y})$ in the
observation space ${\bf y}$:
$q = K p \equiv \int K({\bf y}, {\bf x}) p({\bf x}) {\bf dx}$.
The response function does not have
to be fully efficient: $q$ does not have to integrate to 1 when
$p$ is normalized. In the subsequent discussion, operator $K$ will be assumed
linear and exactly known but not necessarily invertible.
In many situations of interest, observations are described by the empirical
density function ({\it i.e.}, there is no error term associated with
each individual observation):
\begin{equation}
\rho_e({\bf y}) = \frac{1}{N} \sum_{i=1}^{N} \delta({\bf y}  {\bf y}_i).
\end{equation}
In this case, the probability to observe a point at ${\bf y}_i$ is given by
the normalized version of $q$ called $r$: $r = q/\epsilon$.
In case $p$ is normalized,
\begin{equation}
\epsilon = \int q({\bf y}) \,{\bf dy} = \int K p \,{\bf dy}
\label{eq:efficiency}
\end{equation}
is the overall detector
acceptance for the physical process under study.
The purpose of unfolding is to learn as much as possible about $p({\bf x})$
given $\rho_e({\bf y})$ when a~parametric model for $p({\bf x})$ is lacking.
The difficulty of this problem can be easily appreciated from the following
argument. $K$ typically acts as low pass filter. For measurements of
a~single scalar quantity, it can often be assumed that the detector
has resolution $\sigma$ and that $K(y, x) = {\cal N}(y  x, \sigma^2)$, so that
the detector simply convolves $p(x)$ with the normal distribution.
If $q$ was exactly known, the Fourier transform of $p$ could simply
be obtained from $p(\omega) = q(\omega)/K(\omega)$. As $q(\omega)$ is
not known, the closest available approximation is the characteristic
function of $\rho_e(y)$:
$\rho_e(\omega) = \int \rho_e(y) e^{i \omega y} dy = \frac{1}{N} \sum_{i=1}^{N} e^{i \omega y_i}$. As the characteristic function of the normal
distribution is just $K(\omega) = e^{\sigma^2 \omega^2/2}$, the ratio
$\rho_e(\omega)/K(\omega)$ becomes arbitrarily large
as $\omega \rightarrow \infty$. The ``naive'' method of
estimating $p(\omega)$ as $\rho_e(\omega)/K(\omega)$ thus fails
miserably: the high frequency components of the noise contained
in the $\rho_e(\omega)$ are multiplied by an arbitrarily large factor
so that $\rho_e(\omega)/K(\omega)$ isn't even squareintegrable.
A number of effective
approaches to solving the pure deconvolution problem just described
are discussed in~\cite{ref:deconvolutionbook}. These approaches invariably involve
introduction of additional smoothness assumptions about either
$p(x)$ or $q(y)$ or both. Such
assumptions essentially declare that the high frequency components of $p(x)$
are of little interest and, therefore, can be suppressed in the
$\rho_e(\omega)/K(\omega)$ ratio (so that the inverse Fourier transform
can exist). Introduction of new information by applying additional assumptions
which make an~originally illposed problem treatable is called
{\it regularization}.
In the problems of interest for particle physics, the action
of $K({\bf y}, {\bf x})$ on $p({\bf x})$ is usually more complicated than
simple convolution. At the time of this writing,
reconstruction of $p({\bf x})$ in particle physics applications is most often
performed by either the SVD unfolding~\cite{ref:svdunfold}
or the expectationmaximization
(a.k.a. D'Agostini, or Bayesian)
unfolding~\cite{ref:dagounfold}\footnote{In other disciplines,
utility of these methods has been discovered earlier~\cite{ref:unfoldingrev}.}.
Both of these methods start with the
assumption that ${\bf x}$ and ${\bf y}$ spaces are binned, and that
partitioning of these spaces is beyond the control of the method.
In the SVD unfolding, onedimensional ${\bf x}$ is assumed, and the
regularization is performed by penalizing
discretized second derivative of the $p(x)$
density\footnote{This regularization technique has been reinvented
many times. Depending on the problem, is also called
the {\it constrained linear inversion method},
the {\it PhillipsTwomey method}, the {\it Tikhonov regularization},
or the {\it ridgeparameter approach}~\cite{ref:numrecipes, ref:deconvolutionbook}.}. In the expectationmaximization
unfolding, regularization usually consists in imposing
a~subjective limit on the number of iteration cycles. This
early stopping criterion penalizes
deviations from the prior distribution used to start the iterations.
However, due to the method nonlinearity, it is difficult to
augment this statement with analytical derivations
of the penalty applicable to arbitrary response functions.
It should be appreciated that, for these methods,
the sample binning itself
serves as a part of problem regularization. Just imagine
making very wide bins  this leads to a~response
matrix which is almost diagonal and easily invertible.
However, information about small structures within each bin
is now lost.
\subsection{EMS Unfolding}
The unfolding method implemented in NPStat combines smoothing in the
${\bf x}$ space with the expectationmaximization iterations performed
until convergence.
This combination (abbreviated
as EMS  expectationmaximization with smoothing)
has important advantages over SVD unfolding and
expectationmaximization with early stopping:
\begin{thinlist}
\item{{\it Ad hoc} binning is no longer needed. While the solution is
still implemented on a grid, the cell width
can be chosen sufficiently fine
so that discretization does not affect the results.}
\item{Precise response functions can be employed instead of their
coarsely binned versions affected by prior distribution
assumptions.}
\item{Problem regularization can be unambiguously described in terms of the
parameters of the smoothing filter used (bandwidth, {\it etc}).}
\end{thinlist}
While the empirical success of the EMS unfolding method
has already been reported in the statistical
literature~\cite{ref:nychka, ref:smoothedem, ref:dagounfold},
the procedures implemented in NPStat also address the issues
of uncertainty estimation for the reconstructed distribution and of
choosing the filter parameters according to an objective
criterion.
The method proceeds as follows. The standard expectationmaximization
iterations update the reconstructed values of $p({\bf x})$ according to the formula~\cite{ref:nychka, ref:smoothedem}
\begin{equation}
\lambda_j^{(k+1)} = \frac{\lambda_j^{(k)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij} y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(k)}}.
\end{equation}
Here, $\lambda_j^{(k)}$ are the unnormalized $p({\bf x})$ values
(event counts) discretized on a sufficiently fine grid
in the physical process space ${\bf x}$
(whose cells are small in comparison with the typical size of response function
features), obtained on a~$k$\supers{th} iteration. The index $j = 1, ..., m$ refers to
the~{\it linearized} cell number in this (possibly multidimensional) grid.
All $\lambda_j^{(0)}$ values (the starting point for the iterations) can normally
be set to the same constant $c = N/(\epsilon \, m)$, where $N$ is the
number of observed events and $\epsilon$ is the overall detector efficiency
for a~constant $p({\bf x})$. $y_i$, $i = 1, ..., n$, denotes the number
of observed events inside the cell with linearized index $i$ in the space
of observations ${\bf y}$. Dimensionalities of the ${\bf x}$ and ${\bf y}$
spaces can be arbitrary and distinct. $K_{ij}$ is the discretized response
matrix. It is the probability that
an event from the physical cell $j$ causes an observation in the cell
$i$ of the ${\bf y}$ space. $\epsilon_j = \sum_{i=1}^{n} K_{ij}$ is the
detector efficiency for the physical cell $j$.
These iterations are modified by introducing a smoothing step. The updating
scheme becomes
\begin{eqnarray}
\lambda_j^{*(k+1)} & = & \frac{\lambda_j^{(k)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij} y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(k)}}, \\
\lambda_r^{(k+1)} & = & \alpha^{(k+1)} \sum_{j=1}^m S_{r j} \lambda_j^{*(k+1)},
\label{eq:semscheme}
\end{eqnarray}
where $S_{r j}$ is the {\it smoothing matrix}, and
the smoothing step normalization constant, $\alpha^{(k+1)}$, preserves the
overall event count obtained during the preceding expectationmaximization
step (so that $\sum_{r=1}^m \lambda_r^{(k+1)} = \sum_{j=1}^m \lambda_j^{*(k+1)}$). The values $\lambda_j^{(\infty)}$
obtained upon iteration convergence are therefore solutions of the equation
\begin{equation}
\lambda_r^{(\infty)} = \alpha^{(\infty)} \sum_{j=1}^m S_{r j} \frac{\lambda_j^{(\infty)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij} y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}},
\label{eq:semeq}
\end{equation}
where $\alpha^{(\infty)} = \sum_{r=1}^m \lambda_r^{*(\infty)} / \sum_{r=1}^m \sum_{j=1}^m S_{r j} \lambda_j^{*(\infty)}$.
The equation for the error propagation matrix,
$J_{rs} \equiv \frac{\partial \lambda_r^{(\infty)}}{\partial y_s}$, can be obtained
by differentiating Eq.~\ref{eq:semeq} w.r.t. $y_s$. In the matrix notation, this
equation is
\begin{equation}
{\bf J} = (\alpha^{(\infty)} {\bf S} + {\bf A}) \left({\bf M} + {\bf B J}\right),
\label{eq:errprop}
\end{equation}
where
\begin{eqnarray}
A_{j q} & = & \frac{\left(1  \alpha^{(\infty)} \sum_{r=1}^m S_{rq} \right) \lambda_j^{(\infty)}}{\sum_{i=1}^{m} \lambda_i^{(\infty)}}, \label{eq:errpropmatricesa} \\
B_{j q} & = & \frac{\delta_{j q}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij } y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}}  \frac{\lambda_j^{(\infty)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{iq} K_{ij} y_i}{\left(\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}\right)^2},\\
M_{j q} & = & \frac{\lambda_j^{(\infty)}}{\epsilon_j} \frac{K_{q j }}{\sum_{\rho=1}^{m} K_{q\rho} \lambda_{\rho}^{(\infty)}}.
\end{eqnarray}
The NPStat code solves the equivalent equation, $({\bf I}  (\alpha^{(\infty)} {\bf S} + {\bf A}) {\bf B}) \, {\bf J} = (\alpha^{(\infty)} {\bf S} + {\bf A}) {\bf M}$, using the LU
factorization algorithm as implemented in LAPACK~\cite{ref:lapack}, and then
runs iterative refinement cycles defined by Eq.~\ref{eq:errprop}
until convergence.
For unweighted samples, the covariance matrix of observations, ${\bf V}$,
can be derived automatically by NPStat according to either Poisson or
multinomial distribution assumptions
using $\hat{y}_i = \sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}$ as mean values.
In more complicated situations, the user is expected to provide the
covariance matrix of observations\footnote{This may require running the
unfolding code twice, first to obtain $\lambda_{\rho}^{(\infty)}$, and then,
when the covariance matrix is constructed externally, to propagate the uncertainties.
Note that unfolding with the expectationmaximization algorithm intrinsically
assumes Poisson distribution of the observed counts. If the covariance matrix of observations
is highly inconsistent with this assumption, it will be impossible to interpret
the result as a maximum likelihood estimate.}.
The covariance matrix of unfolded values is then estimated according to
${\bf J} {\bf V} {\bf J}\supers{T}$.
While the original expectationmaximization algorithm is agnostic about the dimensionalities
of ${\bf x}$ and ${\bf y}$ spaces, the smoothing step is, of course, dimensionalityspecific.
The unfolding code is using smoothing filters constructed with the help of facilities described
in Section~\ref{sec:densityestimation}. The following classes implement unfolding with
smoothing:
\cname{SmoothedEMUnfold1D} (header file ``npstat/stat/SmoothedEMUnfold1D.hh'')  unfolds
onedimensional distributions using objects of \cname{LocalPolyFilter1D} class as
smoothing filters.
\cname{SmoothedEMUnfoldND} (header file ``npstat/stat/SmoothedEMUnfoldND.hh'')  unfolds
multivariate distributions. Dimensionalities of the ${\bf x}$ and ${\bf y}$ spaces
can be arbitrary
and distinct. Dimensionality of the smoothing filter is, of course, expected to be
consistent with the structure of ${\bf x}$. Typically, the filters will be objects of
either \cname{LocalPolyFilterND} or \cname{SequentialPolyFilterND} class, adapted
for unfolding use by the \cname{UnfoldingFilterND} template.
\cname{SmoothedEMUnfoldND} code is employing a more efficient
implementation of the response matrix than the \cname{Matrix} class
used by the rest of NPStat code, but Eq.~\ref{eq:errprop}
is still solved using normal dense matrices. Therefore, practically
usable number of cells in the discretization
of the ${\bf x}$ space will be at most
a~few thousands, as the computational complexity of the
algorithm based on dense matrices is proportional to this number cubed.
\subsection{Choosing the Smoothing Parameters}
For likelihoodbased inference, a useful model selection principle
is provided by the Akaike information criterion (AIC)~\cite{ref:aic}.
The AIC criterion adjusted for the finite sample size is~\cite{ref:modelsel}
\begin{equation}
AIC_c = 2 \ln L + 2 k + \frac{2 k (k + 1)}{N  k  1},
\label{eq:aicc}
\end{equation}
where $L$ is the maximized value of the model likelihood,
$N$ is the sample size, and $k$ the number of parameters
in the model. Selecting a model
by minimizing $AIC_c$ avoids overfitting by reaching a compromise
between complexity of the model and goodnessoffit.
Application of the $AIC_c$ criterion to the EMS
unfolding procedure is, however, not completely straightforward.
While calculation of the likelihood
can be accomplished assuming Poisson distribution in the space
of observations,
it is not immediately obvious how to count the number of parameters
in the model. To overcome this difficulty,
NPStat assumes that the number of model parameters
can be estimated as the {\it effective rank} of the
${\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T}$ matrix.
This assumption is based on the following reasoning\footnote{One can
also argue that the ${\bf K} {\bf J}$ matrix
plays similar role to the hat matrix in linear regression problems.
This leads to the same conclusion about the number of model parameters.}.
The covariance matrix of the fitted folded values ({\it i.e.}, $\hat{y}_i$) is
${\bf V}_{\hat{y}}({\bf V}) = {\bf K} {\bf J} {\bf V} {\bf J}\supers{T} {\bf K}\supers{T}$. If, using polynomial series, one fits multiple independent
samples of random points taken from the uniform distribution, with
the number of points per sample varying according to the Poisson distribution,
the rank of the covariance matrix of the
fitted unnormalized density values calculated over these samples
will be equal to the degree of the fittted polynomial plus one.
This is precisely the number of parameters of the fitted model.
It doesn't matter how many abscissae are used to construct the covariance
matrix of the fitted values as long as the number
of abscissae exceeds the degree of the polynomial and the
average number or points in a sample is ``sufficiently large''.
While the model fitted to the observed values by the EMS unfolding method
isn't polynomial, we can still identify some measure of the rank of
${\bf V}_{\hat{y}}({\bf I}) = {\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T}$ with the number of model parameters.
Two different definitions of the effective
rank of a symmetric positivesemidefinite matrix
(say, {\bf Q}) are implemented.
The first one is the exponent
of the von Neumann entropy of ${\bf Q}/\mbox{tr} ({\bf Q})$. In terms
of the ${\bf Q}$ eigenvalues, $e_i$, it is expressed
as\footnote{Naturally, $\mbox{erank}_1({\bf Q})$ is also the exponent of the Shannon entropy of the normalized eigenspectrum.}
\begin{equation}
\mbox{erank}_1({\bf Q}) = \exp \left\{ \sum_{i=1}^n \frac{e_i}{\e\} \ln \left( \frac{e_i}{\e\} \right) \right\},
\ \ \ \e\ = \sum_{i=1}^n e_i.
\end{equation}
The second is the ratio of the matrix trace to the largest eigenvalue:
\begin{equation}
\mbox{erank}_2({\bf Q}) = \frac{\mbox{tr} ({\bf Q})}{\max_{1 \le i \le n} e_i} = \frac{\e\}{\max_{1 \le i \le n} e_i}.
\end{equation}
These effective ranks are calculated by the \cname{symPSDefEffectiveRank} method
of the NPStat \cname{Matrix} class.
From some initial experimentation with simple models, it appears that
setting $k$ in Eq.~\ref{eq:aicc} to either $\mbox{erank}_1({\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T})$ or $\mbox{erank}_2({\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T})$ works well, as both of these ranks have similar
derivatives w.r.t. filter bandwidth. The corresponding $AIC_c$ criteria
will be called $EAIC_c$ ($E$ in this abbreviation stands for ``entropic'')
and $TAIC_c$ ($T$ stands for ``trace'').
For any smoothing filter, the most important parameter regulating its
functionality is the bandwidth. NPStat provides the following classes
which facilitate the studies of $AIC_c$ criteria and unfolding performance
vs. filter bandwidth:
\cname{UnfoldingBandwidthScanner1D} (header ``npstat/stat/UnfoldingBandwidthScanner1D.hh'')  simplifies and speeds up studies of 1d EMS unfolding
results obtained with multiple
bandwidth values. Filters constructed for different
bandwidth settings are stored internally
and do not have to be recomputed when
multiple observed samples are processed. Effective ranks of the
${\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T}$ matrices
are memoized as well.
The bandwidth which optimizes either
$EAIC_c$ or $TAIC_c$ criterion can be automatically searched for by
calling the \cname{processAICcBandwidth} method of this class.
Typical class usage is
illustrated by the ``ems\_unfold\_1d.cc'' example program located
in the ``examples/C++'' subdirectory of the NPStat package.
\cname{UnfoldingBandwidthScannerND} (header ``npstat/stat/UnfoldingBandwidthScannerND.hh'')  assists in studies of multivariate EMS unfolding
results obtained with multiple bandwidth settings. \cname{SequentialPolyFilterND}
class is used internally to generate multivariate filters
according to the userprovided
bandwidth values in each dimension of the ${\bf x}$ space. The filters
and the effective ranks are memoized, but there is no automatic procedure
to determine the optimal bandwidth set\footnote{The
\cname{UnfoldingBandwidthScannerND} class
is designed to speed up repetitive mutidimensional grid scans.
For more sophisticated algorithms searching for a~function minimum
in multiple dimensions,
the strategy of memoizing a lot of intermediate
results for each point considered
will not be optimal.}.
Use of an effective rank to determine the number of model parameters
leads to the requirement that the number of discretization cells in
the observation space ${\bf y}$ should be substantially larger than
this rank. This condition should be verified once the
EMS unfolding is performed with the optimal filter.
If the unfolded values of $p({\bf x})$ are expected to become close
to 0 somewhere in the ${\bf x}$ region considered, it is important
to choose a type of filter that guarantees the nonnegativity of
the smoothed result\footnote{The code will automatically truncate negative
filtered values but this truncation will lead to distortions in
the covariance matrix of unfolded results not taken into account by linear
error propagation formulae. A~highly inappropriate choice
of filter can even break the expectationmaximization iterations
by producing $\hat{y}_i = 0$ corresponding to a positive ${y}_i$ value.}.
All LOrPE filters of degree zero have this property.
\subsection{Large Problems with Sparse Matrices}
The standard matrix multiplication methods and the LU factorization algorithm
used to solve Eq.~\ref{eq:errprop} have $O(m^3)$ computational complexity,
where $m$ is the number of cells used to discretize the physical process space.
For programs running
on a single processor, this complexity effectively limits $m$
to a~few thousands. However, in many practically important applications
the response function is sufficiently shortranged so that most elements
of its discretized representation $K_{ij}$ are zeros. In such situations
it may be advantageous to utilize techniques and algorithms developed for
sparse matrices in order to increase $m$.
NPStat provides a headeronly EMS unfolding implementation based on sparse
matrix arithmetic in the namespace ``emsunfold''. In order to take advantage
of this code,
two additional software packages have to be installed by the user.
NPStat relies on Eigen~\cite{ref:eigen} for basic operations
with sparse matrices and for solving sparse linear systems. The package
TRLAN~\cite{ref:trlan} is used for calculating leading eigenvalues and
eigenvectors of large covariance matrices\footnote{I am not aware of any
method for calculating {\it all} eigenvalues and eigenvectors of
a symmetric positivesemidefinite matrix, sparse or not,
with computational complexity better than $O(m^3)$.}.
The following classes implement EMS unfolding with sparse matrices:
\cname{SmoothedEMSparseUnfoldND}
(header ``npstat/emsunfold/SmoothedEMSparseUnfoldND.hh'')
 parallels the functionality of the \cname{SmoothedEMUnfoldND} class, with a few
peculiarities related to the sparseness of various matrices. In particular, only Poisson uncertainties
on the observed data can be calculated automatically, as multinomial covariances
would not be sparse. Userprovided uncertainties are restricted to diagonal
matrices. The smoothing matrix should be doubly
stochastic\footnote{That is, each row and column
of the smoothing matrix should sum to 1, within the tolerance parameter
of the unfolding algorithm.
Construction of doubly stochastic filters with NPStat is discussed
in Section~\ref{sec:bernstein}. Double stochasticity is not
enforced by the code itself, as it is often useful to benchmark the results obtained with sparse
matrices against an implementation utilizing dense matrices.},
otherwise Eq.~\ref{eq:errpropmatricesa} will inject a dense matrix
into the system.
\cname{SparseUnfoldingBandwidthScannerND}
 parallels the functionality of \cname{UnfoldingBandwidthScanner1D},
with additional input parameters and diagnostic outputs needed to drive determination of
eigenvalues and eigenvectors of large covariance matrices. This class
is declared in the header file
``npstat/emsunfold/SparseUnfoldingBandwidthScannerND.hh''.
\section{Pseudo and QuasiRandom Numbers}
\label{sec:randomgen}
The C++11 Standard~\cite{ref:cpp11} defines an API for generating
pseudorandom numbers. Unfortunately, this API
suffers from a disconnect between modeling of statistical
distributions (including densities, cumulative distributions, {\it etc})
and generation of random numbers. Moreover,
quasirandom numbers~\cite{ref:Niederreiter} useful for
a large variety of simulation and data analysis purposes
are not represented by the Standard,
and there is no meaningful support for generating genuinely
multivariate random sequences.
NPStat adopts a different approach towards generation of pseudorandom,
quasirandom,
and nonrandom sequences 
the one that is more appropriate in the context
of a~statistical
analysis package. A small number of highquality generators is
implemented for
producing such sequences on a unit $d$dimensional cube.
All such generators inherit from the same abstract base class
\cname{AbsRandomGenerator} (header file ``npstat/rng/AbsRandomGenerator.hh'').
Generators developed or ported by users can be seamlessly
added as well. Conversion of uniformly distributed sequences into other
types of distributions and to different support regions is performed
by the classes that represent statistical distributions  in particular,
by those classes which inherit from \cname{AbsDistribution1D} and
\cname{AbsDistributionND} bases. Both of these bases have a virtual
method \cname{random} which takes a sequence generator instance as an
input and produces correspondingly distributed numbers (random or not)
on output.
By default, transformation of sequences is performed by the \cname{quantile}
method for onedimensional distributions and by \cname{unitMap}
method for multivariate ones. However, it is expected that the
\cname{random} method itself will be overriden by the derived classes
when it is easier, for example, to make
a random sequence with the desired properties by the acceptancerejection
technique. The following functions and classes are implemented:
\cname{MersenneTwister} (header file ``npstat/rng/MersenneTwister.hh'')
 generates pseudorandom numbers using the
Mersenne Twister algorithm~\cite{ref:mercenne}.
\cname{SobolGenerator} (header file ``npstat/rng/SobolGenerator.hh'')
 generates Sobol quasirandom
sequences~\cite{ref:sobol}.
\cname{HOSobolGenerator} (header file ``npstat/rng/HOSobolGenerator.hh'')
 generates higher order scrambled Sobol
sequences~\cite{ref:hosobol}.
\cname{RandomSequenceRepeater} (header file ``npstat/rng/RandomSequenceRepeater.hh'')
 this class can be used to produce
multiple repetitions of sequences created by other generators
whenever an instance of \cname{AbsRandomGenerator} is needed.
The whole
sequence is simply remembered (which can take a significant
amount of memory
for large sequences) and extended as necessary
by calling the original generator.
\cname{WrappedRandomGen} (header file ``npstat/rng/AbsRandomGenerator.hh'')
 a simple adaptor class for ``old style''
random generator functions like ``drand48()''. Implements
\cname{AbsRandomGenerator} interface.
\cname{CPP11RandomGen} (header file ``npstat/rng/CPP11RandomGen.hh'')
 a simple adaptor class for pseudorandom
generator engines defined in the C++11 standard. Implements
\cname{AbsRandomGenerator} interface.
\cname{EquidistantSampler1D} (header file ``npstat/rng/EquidistantSampler1D.hh'')
 generates a sequence of equidistant
points, similar to bin centers of a~histogram with axis limits at 0 and 1.
\cname{RegularSampler1D} (header file ``npstat/rng/RegularSampler1D.hh'')
 generates a sequence of points by
splitting the $[0, 1]$ interval by 2, then splitting all subintervals
by 2, {\it etc}. The points returned are the split locations.
Useful for generating $2^{k}  1$ points when integer
$k$ is not known in advance.
\cname{convertToSphericalRandom} (header file ``npstat/rng/convertToSphericalRandom.hh'')
 converts a multivariate random
number from a unit $d$dimensional cube into a random direction
in $d$ dimensions and one additional random number between 0 and 1.
Useful for generating random numbers according to spherically
symmetrical distributions.
\section{Algorithms Related to Combinatorics}
The NPStat package implements several functions and algorithms related to
permutations of integers and combinatorics:
\cname{factorial} (header file ``npstat/rng/permutation.hh'')
 this function returns an exact factorial if the
result does not exceed the largest unsigned long (up to $12!$ on 32bit
systems and up to $20!$ on 64bit ones).
\cname{ldfactorial} (header file ``npstat/rng/permutation.hh'')
 this function returns an approximate
factorial up to $1754!$ as a long double.
\cname{logfactorial} (header file ``npstat/rng/permutation.hh'')
 natural logarithm of a factorial, up to $\ln((2^{32}  1)!)$,
as a long double.
\cname{binomialCoefficient} (header file ``npstat/nm/binomialCoefficient.hh'')
 calculates the binomial coefficients
$C_N^M = \frac{N!}{M! (NM)!}$ using an algorithm which avoids
overflows.
\cname{orderedPermutation} (header file ``npstat/rng/permutation.hh'')
 this function can be used to iterate
over permutations of numbers $\{0, 1, ..., N1\}$ in a systematic way.
It generates a unique permutation of such numbers given a~nonnegative input
integer below $N!$.
\cname{permutationNumber} (header file ``npstat/rng/permutation.hh'')
 inverse of \cname{orderedPermutation}:
maps a permutation of numbers $\{0, 1, ..., N1\}$ into a unique
integer between $0$ and $N!$.
\cname{randomPermutation} (header file ``npstat/rng/permutation.hh'')
 generates random permutations of numbers
$\{0, 1, ..., N1\}$ in which probability of every permutation
is the same.
\cname{NMCombinationSequencer} (header file ``npstat/stat/NMCombinationSequencer.hh'')
 this class iterates over all
possible choices of $j_1, ..., j_M$ from
$N$ possible values for each $j_k$ in such a way that all $j_1, ..., j_M$
are distinct and appear in the sequence in the increasing order,
last index changing most often. Naturally, the total number of
all such choices equals to the number of ways to pick $M$ distinct
items out of $N$: it is the binomial coefficient $C_N^M$.
\section{Numerical Analysis Utilities}
\label{sec:utilities}
The NPStat package includes a menagerie of numerical analysis utilities
designed, primarily, to support the statistical calculations described
in the previous sections. They are placed in the ``nm'' directory
of the package. A number of these utilities can be used as
standalone tools. If the corresponding header file
is not mentioned explicitly in the descriptions below,
it is ``npstat/nm/NNNN.hh'', where NNNN stands
for the actual name of the class or function.
\cname{ConvolutionEngine1D} and \cname{ConvolutionEngineND}  These
classes encapsulate
the NPStat interface to the FFTW package~\cite{ref:fftw}.
They can be used to performs DFFT convolutions of onedimensional
and multivariate functions, respectively.
\cname{EquidistantInLinearSpace} (header file ``npstat/nm/EquidistantSequence.hh'')
 A sequence of equidistant points.
For use with algorithms that take a vector of points as one of
their parameters.
\cname{EquidistantInLogSpace} (header file ``npstat/nm/EquidistantSequence.hh'')
 A sequence of points whose logarithms are equidistant.
\cname{findRootInLogSpace}  templated numerical equation solving
for 1$d$ functions (or for 1$d$ subspaces of multivariate functions)
using interval division. It is assumed that the solution can be
represented as a product of some object ({\it e.g.,} a vector)
by a positive real number,
and that number is then searched for.
\cname{GaussHermiteQuadrature} and \cname{GaussLegendreQuadrature}
 templated GaussHermite and GaussLegendre quadratures for
onedimensional functions. Internally, calculations
are performed in long double precision. Of course,
lower precision functions can be integrated as well, with
corresponding reduction in the precision of the result.
\cname{rectangleIntegralCenterAndSize} (header file ``npstat/nm/rectangleQuadrature.hh'')
 GaussLegendre cubatures on
rectangular and hyperrectangular domains using tensor product
integration\footnote{``Tensor product integration'' simply means
that the locations at which the function is evaluated and
corresponding weights are determined by sequential application
of GaussLegendre quadratures in each dimension.}.
\cname{goldenSectionSearchInLogSpace} (header file ``npstat/nm/goldenSectionSearch.hh'')
 templated numerical search
for a minimum of 1$d$ functions (or for 1$d$ subspaces of multivariate
functions) using the golden section method.
It is assumed that location of the minimum can be
represented as a product of some object ({\it e.g.,} a vector)
by a positive constant, and that constant is then searched for.
\cname{goldenSectionSearchOnAGrid} (header file ``npstat/nm/goldenSectionSearch.hh'')
 search for a minimum using coordinates restricted to a userdefined
onedimensional grid. Appropriate for use with functions which are expensive
to evaluate and for which the user has some prior idea about their smoothness.
\cname{parabolicExtremum} (header file ``npstat/nm/MathUtils.hh'')
 determine extremum of a~parabola passing through three given points
on a plane. Can be used in combination with \cname{goldenSectionSearchOnAGrid}
to refine location of the minimum.
\cname{GridAxis}  This class can be used to define an axis of
a rectangular grid with nonuniform spacing of points. The complementary
class \cname{UniformAxis} works more efficiently for representing
equidistant points, while the class \cname{DualAxis} can be
used to represent both uniform and nonuniform grids.
\cname{interpolate\_linear}, \cname{interpolate\_quadratic},
\cname{interpolate\_cubic} (these three functions are declared in the
header file ``npstat/nm/interpolate.hh'')
 linear, quadratic, and cubic
polynomials with given values at two, three, and four equidistant
points, respectively.
\cname{LinInterpolatedTable1D}  persistent onedimensional lookup table
with linear interpolation between the tabulated values. Useful
for representing arbitrary onedimensional functions in case
the full numerical precision is not required. If the table is
monotonous, the inverse table can be constructed automatically.
\cname{LinInterpolatedTableND}  persistent multidimensional lookup table
with multilinear interpolation between the tabulated values,
as in Eq~\ref{eq:multilinear}. Extrapolation beyond
the grid boundaries is supported as well. This class is useful
for representing arbitrary functions in case the full numerical precision
is not required. \cname{GridAxis}, \cname{UniformAxis}, or \cname{DualAxis}
class (or userdeveloped
classes with similar sets of methods) can be used
to define grid point locations.
Note that simple locationbased lookup of stored values
(without interpolation)
can be trivially performed with the \cname{closestBin} method of the
\cname{HistoND} class. Lookup of histogram bin values with interpolation
can be performed by the \cname{interpolateHistoND} function
(header file ``npstat/stat/interpolateHistoND.hh'').
\cname{LinearMapper1d}, \cname{LogMapper1d}  linear and loglinear
transformations in 1$d$ as functors
(with ``double operator()(const double\& x) const'' method defined).
The \cname{CircularMapper1d} class works similarly to \cname{LinearMapper1d}
in situations with circular data topologies.
\cname{Matrix}  A templated matrix class. Useful for standard matrix
manipulations, solving linear systems, finding eigenvalues and
eigenvectors of symmetric matrices, singular value decomposition,
{\it etc.} Encapsulates NPStat interface to LAPACK~\cite{ref:lapack}.
\cname{findPeak3by3}, \cname{findPeak5by5} (header file ``npstat/nm/findPeak2D.hh'') 
utilities which facilitate peak finding for twodimensional surfaces
which can be contaminated by small amounts of noise ({\it e.g.}, from
roundoff errors). The user can fit a 2$d$ quadratic polynomial
inside a $3 \times 3$
or $5 \times 5$ window by least squares\footnote{A fast method
utilizing discrete orthogonal polynomial expansion is used internally.}
and check whether that polynomial has
an extremum inside the window.
Initially intended for studying 2$d$ loglikelihoods using sliding
windows.
\cname{solveQuadratic}, \cname{solveCubic} (header file ``npstat/nm/MathUtils.hh'')
 solutions of quadratic
and cubic equations, respectively, by numerically sound
methods\footnote{The textbook formula
$x_{1,2} = \frac{b \pm \sqrt{b^2  4 a c}}{2 a}$
for the roots of quadratic
equation $a x^2 + b x + c =0$
is also a~prime example of a numerical analysis pitfall.
A minimal modification,
$x_1 = \frac{b + (I(b \ge 0)  I(b < 0))\sqrt{b^2  4 a c}}{2 a}$,
$x_2 = \frac{c}{a x_1}$, avoids the subtractive
cancellation problem in the $x_{1,2}$ numerator.}.
\cname{ndUnitSphereVolume} (header file ``npstat/nm/MathUtils.hh'')
 volume of the $n$dimensional unit sphere.
\cname{polyAndDeriv} (header file ``npstat/nm/MathUtils.hh'')
 monomial series $\sum_{k=0}^M c_k x^k$
and its derivative with respect to $x$, templated on the type
of coefficients $c_k$.
\cname{polySeriesSum}, \cname{legendreSeriesSum},
\cname{gegenbauerSeriesSum}, \cname{chebyshevSeriesSum} (all of these
functions are declared in the header file ``npstat/nm/MathUtils.hh'')
 templated
series of onedimensional monomials, Legendre polynomials, Gegenbauer
polynomials, and Chebyshev polynomials, respectively. Numerically
sound recursive formulae are used to generate the polynomials.
\cname{hermiteSeriesSumProb}, \cname{hermiteSeriesSumPhys} (header
file ``npstat/nm/MathUtils.hh'')  templated series of
``probabilist'' and ``physicist'' Hermite polynomials, respectively.
\cname{chebyshevSeriesCoeffs} (header
file ``npstat/nm/MathUtils.hh'')  utility for approximating
mathematical functions with Chebyshev polynomials.
\cname{OrthoPoly1D}, \cname{OrthoPolyND}  uni and multivariate
orthogonal polynomials on equidistant rectangular grids with
arbitrary weight functions. In addition to producing
the polynomials themselves, these classes can
be used to calculate polynomial series,
polynomial series expansion coefficients for gridded
functions, and polynomial filters defined by Eq.~\ref{eq:effk}
(but normalized so that the sum of filter coefficients is 1).
The NPStat package also includes implementations of various
special functions needed in statistical calculations (incomplete
gamma function and its inverse, incomplete beta function, {\it etc}).
These functions are declared in the header file
``npstat/nm/SpecialFunctions.hh''.
\cleardoublepage
\phantomsection
\addcontentsline{toc}{section}{References}
\begin{thebibliography}{99}
\bibitem{ref:minuit}
Minuit2 Minimization Package,
\href{http://www.cern.ch/minuit/}{http://www.cern.ch/minuit/}
\bibitem{ref:geners}
Geners — Generic Serialization for C++,
\href{http://geners.hepforge.org/}{http://geners.hepforge.org/}
\bibitem{ref:cpp11} ISO/IEC Standard 14882:2011
{\it Programming Language C++} (2011).
\bibitem{ref:linearbinning}
M.C.~Jones, and H.W.~Lotwick, ``On the Errors Involved in Computing the Empirical Characteristic Function'', {\it Journal of Statistical Computation and Simulation} {\bf 17}, 133 (1983).
\bibitem{ref:sampleskewkurt}
D.N. Joanes and C.A. Gill, ``Comparing measures of sample skewness
and kurtosis'', {\it The Statistician} {\bf 47}, 183 (1998).
\bibitem{ref:ratioofnormals}
D.V. Hinkley, ``On the ratio of two correlated normal random variables'',
{\it Biometrika} {\bf 56}, 635 (1969).
\bibitem{ref:johnson}
N.L. Johnson, ``Systems of Frequency Curves Generated by Methods of Translation'', {\it Biometrika} {\bf 36}, 149 (1949).
\bibitem{ref:johnsonbook}
W.P. Elderton and N.L. Johnson, ``Systems of Frequency Curves'',
Cambridge University Press (1969).
\bibitem{ref:hahnshap}
G.J. Hahn and S.S. Shapiro, ``Statistical Models in Engineering'',
Wiley (1994).
\bibitem{ref:draper}
J. Draper, ``Properties of Distributions Resulting from Certain Simple Transformations of the Normal Distribution'', {\it Biometrika} {\bf 39}, 290 (1952).
\bibitem{ref:hill}
I.D. Hill, R. Hill, and R. L. Holder,
``Algorithm AS 99: Fitting Johnson Curves by Moments'',
{\it Applied Statistics} {\bf 25}, 180 (1976).
\bibitem{ref:handcock}
M.S. Handcock and M. Morris, ``Relative Distribution Methods'',
{\it Sociological Methodology} {\bf 28}, 53 (1998).
\bibitem{ref:thas}
O. Thas, ``Comparing Distributions'', Springer Series in Statistics (2009).
\bibitem{ref:kdetransform}
L. Yang and J.S. Marron, ``Iterated TransformationKernel Density Estimation'',
{\it Journal of the American Statistical Association} {\bf 94}, 580 (1999).
\bibitem{ref:smoothtest}
J. Neyman, ``'Smooth Test' for Goodness of Fit'', {\it Skandinavisk Aktuarietidskrift} {\bf 20}, 150~(1937).
\bibitem{ref:copulabook}
R.B.~Nelsen, ``An Introduction to Copulas'', 2\supers{nd} Ed.,
Springer Series in Statistics (2006).
\bibitem{ref:eightqueens}
\href{http://en.wikipedia.org/wiki/Eight_queens_puzzle}{``Eight queens puzzle'', Wikipedia.}
\bibitem{ref:eightrooks}
\href{http://en.wikipedia.org/wiki/Rook_polynomial}{``Rook polynomial'', Wikipedia.}
\bibitem{ref:copulaentropy}
J.~Ma and Z.~Sun, ``Mutual information is copula entropy'',
\href{http://arxiv.org/abs/0808.0845}{arXiv:0808.0845v1} (2008).
\bibitem{ref:read}
A.L.~Read, ``Linear interpolation of histograms'',
{\it Nucl. Instr. Meth.} {\bf A 425}, 357 (1999).
\bibitem{ref:silverman} B.W.~Silverman, ``Density Estimation for Statistics
and Data Analysis'', Chapman \& Hall (1986).
\bibitem{ref:izenmann} A.J.~Izenman, ``Recent Developments in Nonparametric Density Estimation'', {\it Journal of the American Statistical Association} {\bf 86}, 205 (1991).
\bibitem{ref:kde} D.W.~Scott,
``Multivariate Density Estimation: Theory, Practice, and Visualization'',
Wiley (1992).
\bibitem{ref:bwopt} M.C.~Jones, J.S.~Marron and S.J.~Sheather,
``A Brief Survey of Bandwidth Selection for Density Estimation'',
{\it Journal of the American Statistical Association} {\bf 91}, 401 (1996).
\bibitem{ref:loaderbw} C.R.~Loader,
``Bandwidth Selection: Classical or Plugin?'',
{\it The Annals of Statistics} {\bf 27}, 415 (1999).
\bibitem{ref:kernsmooth} M.P.~Wand and M.C.~Jones,
``Kernel Smoothing'', Chapman \& Hall (1995).
\bibitem{ref:chiu2} S.T. Chiu,
``An Automatic Bandwidth Selector for Kernel Density Estimation'',
{\it Biometrika} {\bf 79}, 771 (1992).
\bibitem{ref:chiumultivar} T.J. Wu and M.H. Tsai,
``Root n bandwidths selectors in multivariate kernel density estimation'',
{\it Probab. Theory Relat. Fields} {\bf 129}, 537 (2004).
\bibitem{ref:discretizedkde} M.C.~Jones,
``Discretized and Interpolated Kernel Density Estimates'',
{\it Journal of the American Statistical Association} {\bf 84}, 733 (1989).
\bibitem{ref:fastkde1} C.~Yang, R.~Duraiswami, N.A.~Gumerov and L.~Davis,
``Improved Fast Gauss Transform and Efficient Kernel Density Estimation'',
in {\it Proceedings of the Ninth IEEE International
Conference on Computer Vision} (2003).
\bibitem{ref:fastkde2} D.~Lee, A.G.~Gray, and A.W. Moore,
``DualTree Fast Gauss Transforms'',
\href{http://arxiv.org/abs/1102.2878}{arXiv:1102.2878v1} (2011).
\bibitem{ref:bwgaussmix} J.S. Marron and M.P. Wand,
``Exact Mean Integrated Squared Error'', {\it The Annals Of Statistics}
{\bf 20}, 712 (1992).
\bibitem{ref:abramson} I.S.~Abramson, ``On Bandwidth Variation in Kernel
Estimates  A Square Root Law'', {\it The Annals Of Statistics}
{\bf 10}, 1217 (1982).
\bibitem{ref:placida} P.D.A. Dassanayake,
``Local Orthogonal Polynomial Expansion for Density Estimation'',
M.S. Thesis, Texas Tech University (2012).
+\bibitem{ref:lorpe} D.P.A. Dassanayake, I. Volobouev, and A.A. Trindade,
+``Local Orthogonal Polynomial Expansion for Density Estimation'',
+{\it Journal of Nonparametric Statistics} {\bf 29}, 806 (2017).
+
\bibitem{ref:betakern} S.X. Chen, ``Beta kernel estimators for
density functions'', {\it Computational Statistics \& Data Analysis}
{\bf 31}, 131 (1999).
\bibitem{ref:babu} G.J. Babu, A.J. Canty, and Y.P. Chaubey,
``Application of Bernstein Polynomials for smooth estimation
of a distribution and density function'', {\it Journal of Statistical
Planning and Inference} {\bf 105}, 377 (2002).
\bibitem{ref:sinkhorn} R. Sinkhorn and P. Knopp,
``Concerning Nonnegative Matrices and Doubly Stochastic Matrices'',
{\it Pacific Journal of Mathematics} {\bf 21}, 343 (1967).
\bibitem{ref:khoury} R. Khoury, ``Closest Matrices in the Space of
Generalized Doubly Stochastic Matrices'', {\it Journal of
Mathematical Analysis and Applications} {\bf 222}, 562 (1998).
\bibitem{ref:lapack}
LAPACK — Linear Algebra PACKage,
\href{http://www.netlib.org/lapack/}{http://www.netlib.org/lapack/}
\bibitem{ref:localreg} T.J.~Hastie, ``Nonparametric Logistic Regression'',
SLAC PUB3160, June 1983.
\bibitem{ref:locfit}
CRAN  Package locfit,
\href{http://cran.rproject.org/web/packages/locfit/}
{http://cran.rproject.org/web/packages/locfit/}
\bibitem{ref:lvm} K.~Levenberg, ``A Method for the Solution
of Certain NonLinear Problems in Least Squares'',
{\it Quarterly of Applied Mathematics} {\bf 2}, 164 (1944).
D.~Marquardt, ``An Algorithm for LeastSquares Estimation of Nonlinear Parameters'', {\it SIAM Journal on Applied Mathematics} {\bf 11}, 431 (1963).
\bibitem{ref:kdtree} J.L.~Bentley, ``Multidimensional
Binary Search Trees Used for Associative Searching'',
{\it Communications of the ACM} {\bf 18}, 509 (1975).
\bibitem{ref:localquantreg} H.J. Wang and L. Wang,
``Locally Weighted Censored Quantile Regression'',
{\it Journal of the American Statistical Association} {\bf 104}, 487 (2009).
\bibitem{ref:unfoldingrev}
F.~Span\`o, ``Unfolding in particle physics:
a window on solving inverse problems'',
{\it EPJ Web of Conferences} {\bf 55}, 03002 (2013).
\bibitem{ref:numrecipes}
W.H.~Press, S.A.~Teukolsky, W.T.~Vetterling, and B.P.~Flannery,
``Numerical Recipes: the Art of Scientific Computing'', 3rd ed.,
Cambridge University Press (2007).
\bibitem{ref:deconvolutionbook}
A.~Meister, ``Deconvolution Problems in Nonparametric Statistics'',
Springer Lecture Notes in Statistics, Vol. 193 (2009).
\bibitem{ref:svdunfold}
A. H\"ocker and V. Kartvelishvili, ``SVD approach to data unfolding'',
{\it Nuclear Instruments and Methods in Physics Research A}
{\bf 372}, 469 (1996).
\bibitem{ref:dagounfold}
G. D'Agostini, ``A multidimensional unfolding method based on Bayes' theorem'',
{\it Nuclear Instruments and Methods in Physics Research A}
{\bf 362}, 487 (1995).
\bibitem{ref:nychka}
D. Nychka, ``Some Properties of Adding a Smoothing Step to the EM Algorithm'',
{\it Statistics \& Probability Letters} {\bf 9}, 187 (1990).
\bibitem{ref:smoothedem}
B.W. Silverman \etal, ``A Smoothed EM Approach to Indirect Estimation Problems,
with Particular Reference to Stereology and Emission Tomography'',
{\it Journal of the Royal Statistical Society B} {\bf 52}, 271 (1990).
\bibitem{ref:aic}
H.~Akaike, ``A new look at the statistical model identification'',
{\it IEEE Transactions on Automatic Control} {\bf 19}, 716 (1974).
\bibitem{ref:modelsel}
K.P. Burnham and D.R. Anderson, ``Model Selection and MultiModel Inference:
A~Practical InformationTheoretic Approach'', 2nd ed., Springer (2004).
\bibitem{ref:eigen}
Eigen  a C++ template library for linear algebra:
matrices, vectors, numerical solvers, and related algorithms,
\href{http://eigen.tuxfamily.org/}{http://eigen.tuxfamily.org/}
\bibitem{ref:trlan}
K.~Wu and H.~Simon, ``ThickRestart Lanczos Method
for Large Symmetric Eigenvalue Problems'',
{\it SIAM Journal on Matrix Analysis and Applications} {\bf 22}, 602 (2000).
\bibitem{ref:Niederreiter}
H. Niederreiter, ``Random Number Generation and QuasiMonte
Carlo Methods'', Society for Industrial and Applied Mathematics (SIAM),
Philadelphia (1992).
\bibitem{ref:mercenne} M. Matsumoto and T. Nishimura, ``Mersenne Twister:
A 623Dimensionally Equidistributed Uniform PseudoRandom Number Generator'',
{\it ACM Transactions on Modeling and Computer Simulation} {\bf 8}, 3 (1998).
\bibitem{ref:sobol} I. Sobol and Y.L. Levitan,
``The Production of Points Uniformly Distributed in a Multidimensional Cube'',
Tech. Rep. 40, Institute of Applied Mathematics,
USSR Academy of Sciences, 1976 (in Russian).
\bibitem{ref:hosobol} J. Dick, ``Higher order scrambled digital nets achieve the optimal rate of the root mean square error for smooth integrands'',
\href{http://arxiv.org/abs/1007.0842}{arXiv:1007.0842} (2010).
\bibitem{ref:fftw}
FFTW (Fastest Fourier Transform in the West),
\href{http://www.fftw.org/}{http://www.fftw.org/}
\end{thebibliography}
\cleardoublepage
\phantomsection
\addcontentsline{toc}{section}{Functions and Classes}
\printindex
\end{document}
Index: trunk/INSTALL
===================================================================
 trunk/INSTALL (revision 674)
+++ trunk/INSTALL (revision 675)
@@ 1,50 +1,51 @@
The NPStat package depends on several other packages which must be
installed first:
1) Double precision version of the FFTW library from www.fftw.org
2) The "Geners" package from http://geners.hepforge.org/
3) The "KSTest" package from http://kstest.hepforge.org/
4) Fortran LAPACK and BLAS libraries liblapack.so and libblas.so
(installed on many systems by default).
If these packages are not yet installed on your machine, please
install them first following the instructions in their corresponding
INSTALL files. Make sure that your C++ compiler supports C++11.
A few NPStat algorithms need an external minimization engine. These
algorithms can be easily accessed if the Minuit2 minimization package
is installed (see http://www.cern.ch/minuit/) However, Minuit2 is not
needed in order to compile NPStat  Minuit2 is interfaced to NPStat
via header files only.
Set the PKG_CONFIG_PATH environment variable to the name of the
directory where the FFTW and Geners libraries install their package
configuration files "fftw3.pc" and "geners.pc", respectively. Usually,
this directory is /usr/local/lib/pkgconfig.
After this, a typical NPStat installation can proceed as follows:
./configure disablestatic withpic
make
make install
The last step may have to be performed as root. Run
./configure help
to view available configuration options.
If you want to compile the python API, you might want to install a
wellintegrated python distribution first, such as the anaconda one.
After completing the NPStat installation steps described above, run
+If you want to compile the python API (NPStat uses python3), you might
+want to install a wellintegrated python distribution first, such as
+the anaconda one (https://www.anaconda.com/download/). After
+completing the NPStat installation steps described above, run
make python
The Python extension will be compiled inside "npstat/swig" directory
but it will not be installed anywhere else. In order to use the python
API, add the absolute path to your "npstat/swig" directory to the
PYTHONPATH environmental variable. Note that compiling the Python
+but it will not be installed anywhere else. In order to use the
+python API, add the absolute path of your "npstat/swig" directory to
+the PYTHONPATH environmental variable. Note that compiling the Python
wrapper can take a long time (tens of minutes).
Index: trunk/npstat/stat/InterpolatedDistro1D1P.hh
===================================================================
 trunk/npstat/stat/InterpolatedDistro1D1P.hh (revision 674)
+++ trunk/npstat/stat/InterpolatedDistro1D1P.hh (revision 675)
@@ 1,104 +1,108 @@
#ifndef NPSTAT_INTERPOLATEDDISTRO1D1P_HH_
#define NPSTAT_INTERPOLATEDDISTRO1D1P_HH_
/*!
// \file InterpolatedDistro1D1P.hh
//
// \brief Interpolation of 1d distributions as a function of one parameter
//
// Author: I. Volobouev
//
// December 2014
*/
#include "geners/CPP11_shared_ptr.hh"
#include "npstat/nm/GridAxis.hh"
#include "npstat/stat/AbsInterpolatedDistribution1D.hh"
namespace npstat {
class InterpolatedDistro1D1P : public AbsDistribution1D
{
public:
#ifndef SWIGBUG
 /** This constructor builds an object which is ready to use */
+ /**
+ // This constructor builds an object which is ready to use.
+ // The "distros" argument must provide one distribution for
+ // each grid cell.
+ */
InterpolatedDistro1D1P(
const GridAxis& axis,
const std::vector >& distros,
double initialParameterValue = 0.0, bool interpolateVertically = false);
#endif // SWIGBUG
/**
// If this constructor is used, the object must be built incrementally,
// using "setGridDistro" calls, one call for each grid cell. After all
// distributions have been set, provide the parameter value with
// "setParameter". Only then one can call methods "density", etc.
*/
InterpolatedDistro1D1P(const GridAxis& axis,
bool interpolateVertically = false);
inline virtual ~InterpolatedDistro1D1P() {delete interpolator_;}
InterpolatedDistro1D1P(const InterpolatedDistro1D1P&);
InterpolatedDistro1D1P& operator=(const InterpolatedDistro1D1P&);
/** Set the distribution corresponding to the given grid point */
void setGridDistro(unsigned cell, const AbsDistribution1D& distro);
/** Set the value of the parameter */
void setParameter(double value);
/** Get the value of the parameter */
inline double getParameter() const {checkReady(); return param_;}
inline const GridAxis& getAxis() const {return axis_;}
inline unsigned nDistros() const {return axis_.nCoords();}
void interpolateVertically(bool b);
inline bool interpolatingVertically() const {return vertical_;}
/** "Virtual copy constructor" */
inline virtual InterpolatedDistro1D1P* clone() const
{return new InterpolatedDistro1D1P(*this);}
inline double density(const double x) const
{checkReady(); return interpolator_>density(x);}
inline double cdf(const double x) const
{checkReady(); return interpolator_>cdf(x);}
inline double exceedance(const double x) const
{checkReady(); return interpolator_>exceedance(x);}
inline double quantile(const double x) const
{checkReady(); return interpolator_>quantile(x);}
//@{
/** Prototype needed for I/O */
virtual inline gs::ClassId classId() const {return gs::ClassId(*this);}
virtual bool write(std::ostream&) const;
//@}
static inline const char* classname()
{return "npstat::InterpolatedDistro1D1P";}
static inline unsigned version() {return 2;}
static InterpolatedDistro1D1P* read(const gs::ClassId& id,
std::istream& in);
protected:
virtual bool isEqual(const AbsDistribution1D&) const;
private:
InterpolatedDistro1D1P();
void rebuildInterpolator();
void checkDistros();
void checkReady() const;
GridAxis axis_;
std::vector > distros_;
AbsInterpolatedDistribution1D* interpolator_;
double param_;
bool vertical_;
bool parameterSet_;
bool haveAllDistros_;
};
}
#endif // NPSTAT_INTERPOLATEDDISTRO1D1P_HH_
Index: trunk/NEWS
===================================================================
 trunk/NEWS (revision 674)
+++ trunk/NEWS (revision 675)
@@ 1,764 +1,764 @@
Version 5.1.0  development
+Version 5.1.0  Nov 17 2019, by I. Volobouev
* Started using pythonspecific features (pythonprepend, etc) in the
SWIG .i files.
* Added function "scannedKSDistance".
* Added Python API for the persistence of several classes.
+* Added Python API for persistence of several classes.
Version 5.0.0  Oct 17 2019, by I. Volobouev
* C++11 support is now mandatory.
* Added more classes and functions to Python API. Overhauled some of
the existing interfaces. Updated everything to python3.
* Increased the maximum order of "classical" Edgeworth expansions to 14.
* Increased the maximum order of conversions between 1d central moments
and cumulants to 20.
* Added "histoQuantiles" function.
* Added "correctDensityEstimateGHU" function.
* Added "randomHistoFill1D" utility function.
* Added ComparisonDistribution1D class.
* Added classes GaussND, LinTransformedDistroND, and DistributionMixND.
* Added a lot of SWIG .i files
Version 4.11.0  July 22 2019, by I. Volobouev
* Added support for cumulant calculations for various Wald statistics
in the Poisson process model.
* Added functions convertCumulantsToCentralMoments and
convertCentralMomentsToCumulants (header file cumulantConversion.hh).
* Added function "cumulantUncertainties".
Version 4.10.0  July 11 2019, by I. Volobouev
* Added SemiInfGaussianQuadrature class.
* Added functions arrayMoment, arrayMoments, and arrayCentralMoments.
* Added enum EdgeworthSeriesMethod and class EdgeworthSeries1D.
* Added DeltaMixture1D class.
* Added enum LikelihoodStatisticType.
* Added functions "mixtureModelCumulants" and "poissonProcessCumulants"
in the header likelihoodStatisticCumulants.hh.
Version 4.9.0  Dec 18 2018, by I. Volobouev
* Added "integratePoly" and "jointIntegral" methods to the
AbsClassicalOrthoPoly1D class.
* Added utility functions "truncatedInverseSqrt" and "matrixIndexPairs".
* Added a number of functions to "orthoPoly1DVProducts.hh".
* Added classes ChebyshevOrthoPoly1st and ChebyshevOrthoPoly2nd inheriting
from AbsClassicalOrthoPoly1D.
* Added class HermiteProbOrthoPoly1D.
* Added FejerQuadrature class.
* Added classe IsoscelesTriangle1D and Logistic1D.
* Added classes AbsKDE1DKernel and KDE1DHOSymbetaKernel.
* Added static function "optimalDegreeHart" to OSDE1D class.
Version 4.8.0  Jul 9 2018, by I. Volobouev
* Added ShiftedLegendreOrthoPoly1D class.
* Added Lanczos method to generate recurrence coefficients for the
ContOrthoPoly1D class.
* Added npstat/stat/orthoPoly1DVProducts.hh file with various utilities
for statistical analyis of chaos polynomials.
Version 4.7.0  Jan 13 2018, by I. Volobouev
* Added "UniPareto1D" distribution (uniform with Pareto tail to the right).
* More coordinate/weight cases for the GaussLegendreQuadrature class.
* Added ContOrthoPoly1D class  continuous orthogonal polynomials with
discrete measure.
* Added functions "linearLeastSquares" and "tdSymEigen" to the Matrix class.
* Added OSDE1D class.
* Added classes LocationScaleFamily1D and SinhAsinhTransform1D.
* Added new functors (CdfFunctor1D, etc) as AbsDistribution1D helpers.
* Small fix in StatAccumulatorArr.cc.
Version 4.6.0  Jan 23 2017, by I. Volobouev
* Updated 1d LOrPE cross validation code (classes AbsBandwidthCV,
BandwidthCVLeastSquares1D, BandwidthCVPseudoLogli1D) for use with
weighted samples in the case the sample itself is available at the
point cross validation is run.
Version 4.5.0  Aug 01 2016, by I. Volobouev
* A small fix in OrthoPolyND.icc (switched from cycling over unsigned to
unsigned long in the scalar product function).
* Implemented GaussHermite quadrature with Gaussian density weight.
* Changed the meaning of Quadratic1D and LogQuadratic1D parameters to be
consistent with Legendre polynomial coefficients on [1, 1] (new
parameters are 1/2 of old).
* Added class MinuitUnbinnedFitFcn1D (to interfaces).
* Added function "findRootNewtonRaphson".
* Added "statUncertainties" header with various functions.
Version 4.4.0  May 9 2016, by I. Volobouev
* Added "timestamp" function.
* Improved implementation of BinnedDensity1D::unscaledQuantile function.
Certain problems caused by roundoff errors are now fixed.
* Added the capability to use the single closest parameter cells (thus
disabling actual interpolation between parameter values, for speed)
to "GridInterpolatedDistribution".
Version 4.3.0  March 19 2016, by I. Volobouev
* Put additional assert statements inside OrthoPolyND.icc to prevent
phony "array subscript is above array bounds" messages in g++ 4.9.2.
* Improved CmdLine.hh.
* Additional methods in CopulaInterpolationND and GridInterpolatedDistribution.
* Added function "volumeDensityFromBinnedRadial".
* Added convenience method "globalFilter" to the OrthoPoly1D class.
* Initiated the transition of the Python API from Python 2 to Python 3.
Version 4.2.0  October 29 2015, by I. Volobouev
* Added interpolation methods for the marginals to classes
"CopulaInterpolationND" and "GridInterpolatedDistribution".
* Removed assert on underflow in the "igamc" function. Now
in such cases this function simply returns 0.0.
Version 4.1.0  July 27 2015, by I. Volobouev
* Made a few additional methods virtual in AbsNtuple.
* Declared methods "columnIndices" of AbsNtuple const (as they should be).
* Added function "weightedCopulaHisto" to build copulas for samples of
weighted points.
* Added function "weightedVariableBandwidthSmooth1D" to use variable
bandwidth smoothing with weighted histograms.
* Added "AbsWeightedMarginalSmoother" interface class for smoothing samples
of weighted points. Modified classes ConstantBandwidthSmoother1D,
JohnsonKDESmoother, and LOrPEMarginalSmoother to support this interface.
* Added class "VariableBandwidthSmoother1D" which implements both
AbsMarginalSmoother and AbsWeightedMarginalSmoother interfaces.
* Implemented crossvalidation for weighted samples.
* Implemented "buildInterpolatedCompositeDistroND" for generic construction
of multivariate transfer functions.
* Implemented "buildInterpolatedDistro1DNP" for generic construction
of univariate transfer functions.
Version 4.0.1  June 17 2015, by I. Volobouev
* Added "dump_qmc" example executable.
Version 4.0.0  June 10 2015, by I. Volobouev
* Complete overhaul of 1d filterbuilding code. Addition of new boundary
methods is a lot easier now. The user API for choosing a LOrPE boundary
method is encapsulated in the new "BoundaryHandling" class.
* Implemented a number of new filter builders with different boundary
treatments.
* Updated the "LocalPolyFilter1D" class so that it holds the local
bandwidth factors derived by the filter builders.
Version 3.8.0  June 1 2015, by I. Volobouev
* Implemented class ConstSqFilter1DBuilder (declared in the header file
npstat/stat/Filter1DBuilders.hh). The "BoundaryMethod" enum has been
extended accordingly. Other files using this enum have been updated.
* Implemented class FoldingSqFilter1DBuilder. Similar to ConstSqFilter1DBuilder
but it also folds the kernel in addition to stretching it.
* Added virtual destructors to a number of classes.
* Added "externalMemArrayND" with various signatures to allow the use of
ArrayND with memory not managed by ArrayND.
* Added move constructors and move assignment operators to ArrayND and
Matrix classes.
Version 3.7.0  May 10 2015, by I. Volobouev
* Better numerical derivative calculation in InterpolatedDistribution1D.cc.
* Added class "LocalMultiFilter1D" for fast generation of filters which
correspond to each orthogonal polynomial separately.
* Added a function calculating the area of ndimensional sphere.
* Added I/O capabilities to the RadialProfileND class.
* Added class "LogRatioTransform1D".
* Added utility function "multiFill1DHistoWithCDFWeights" (header file
histoUtils.hh).
* Avoiding underflow of the incomplete gamma in "convertToSphericalRandom".
Version 3.6.0  April 6 2015, by I. Volobouev
* Fixed Marsaglia's code calculating the AndersonDarling statistics
(it was breaking down for large values of z).
* Added highlevel driver function "simpleVariableBandwidthSmooth1D" to
automatically build the pilot estimate for "variableBandwidthSmooth1D".
* Swithched to log of sigma as Minuit parameter in "minuitFitJohnsonCurves"
instead of sigma (linear sigma would sometimes break the fit when Minuit
would come up with 0 or negative trial value for it).
* Extended "MinuitDensityFitFcn1D" class so that it could be used to fit
nonuniformly binned histograms.
* Adapted "minuitFitJohnsonCurves" function so that it could be used with
histograms templated upon DualHistoAxis.
* Added functions "fillArrayCentersPreservingAreas" and
"canFillArrayCentersPreservingAreas".
* Implemented an interface to monotonous coordinate transformations with
the intent of using them in constructing densities. Implemented a number
of transforms.
* Implemented class "TransformedDistribution1D".
* Added class "VerticallyInterpolatedDistro1D1P".
* Added utility function "fill1DHistoWithCDFWeights".
Version 3.5.0  February 21 2015, by I. Volobouev
* Added "symPDEigenInv" method to the Matrix class.
* Added "variableCount" method to unfolding bandwidth scanner classes.
* Increased the tolerance parameters in JohnsonSu::initialize and in
JohnsonFit constructor.
* Bug fix in function "fillHistoFromText".
Version 3.4.4  January 13 2015, by I. Volobouev
* Corrected handling of some "assert" statements so that the code compiles
correctly with the DNDEBUG option.
Version 3.4.3  January 5 2015, by I. Volobouev
* Implemented class MirroredGauss1D.
* Added method "getOracleData" to class UnfoldingBandwidthScanner1D.
* Bug fix in FoldingFilter1DBuilder::makeOrthoPoly.
Version 3.4.2  December 15 2014, by I. Volobouev
* Implemented InterpolatedDistro1D1P class.
Version 3.4.1  November 07 2014, by I. Volobouev
* Implemented "divideTransforms" function for deconvolutions.
* Implemented the Moyal distribution.
* Added "fillHistoFromText" utility function.
* Added "apply_lorpe_1d" example.
Version 3.4.0  October 01 2014, by I. Volobouev
* Implemented Hadamard product and Hadamard ratio for matrices.
* Bug fix in the "solve_linear_system" lapack interface function.
Version 3.3.1  August 08 2014, by I. Volobouev
* Terminate iterative refinement of the unfolding error propagation
matrix early in case the solution does not seem to improve.
Version 3.3.0  August 05 2014, by I. Volobouev
* Added correction factors to the determination of the number of fitted
parameters in various unfolding procedures.
Version 3.2.0  July 25 2014, by I. Volobouev
* Added "gaussianResponseMatrix" function for nonuniform binning.
* Added Pareto distribution.
* Implemented EMS unfolding with sparse matrices.
* Added methods "getObservedShape" and "getUnfoldedShape" to the AbsUnfoldND
class.
* Bug fix in the assignment operator of ProductDistributionND class. Made
class ProductDistributionND persistent.
* Bug fix in the error propagation for unfolding, in the code which takes
into account the extra normalization constant.
* Added "productResponseMatrix" function to assist in making sparse response
matrices.
* Bug fix in the factory constructor of the Cauchy1D class.
* Completed implementation of the "RatioOfNormals" class.
Version 3.1.0  June 29 2014, by I. Volobouev
* Improved (again) random number generator for the 1d Gaussian distribution.
Something about expectation values of normalized Hermite polynomials over
random numbers made by this generator is still not fully understood. The
standard deviation of these expectations decreases with the polynomial
order (while it should stay constant). It is possible that the numbers
of points used are simply insufficient to sample the tails correctly.
* Implemented smoothed expectationmaximization (a.k.a. D'Agostini) unfolding
for 1d distributions in classes SmoothedEMUnfold1D and MultiscaleEMUnfold1D.
In certain usage scenarios, MultiscaleEMUnfold1D can be more efficient than
SmoothedEMUnfold1D.
* Implemented smoothed expectationmaximization unfolding for multivariate
distributions in a class SmoothedEMUnfoldND.
* Added class "UnfoldingBandwidthScanner1D" to study 1d unfolding behavior
as a function of filter bandwidth.
* Added class "UnfoldingBandwidthScannerND" to study multivariate unfolding
behavior as a function of provided bandwidth values.
* Added DummyLocalPolyFilter1D class useful when a filter is needed which
does not smooth anything.
* Added function "poissonLogLikelihood" (header file npstat/stat/arrayStats.hh).
* Added function "pooledDiscreteTabulated1D" (header file
npstat/stat/DiscreteDistributions1D.hh).
* Implemented class UGaussConvolution1D (convolution of uniform distribution
with a Gaussian).
* Implemented gamma distribution (class Gamma1D).
* Defined interface for comparing binned distributions, AbsBinnedComparison1D.
* Implemented several comparison classes for comparing binned distributions:
PearsonsChiSquared, BinnedKSTest1D, BinnedADTest1D. Class BinnedKSTest1D
pulled in dependence on the "kstest" package.
* Made classes LocalPolyFilter1D, LocalPolyFilterND, and SequentialPolyFilterND
persistent.
* Added code generating dense filter matrices to LocalPolyFilterND and
SequentialPolyFilterND (as needed for unfolding).
* Made class MemoizingSymbetaFilterProvider persistent.
* Implemented function goldenSectionSearchOnAGrid (header file
npstat/nm/goldenSectionSearch.hh).
* Implemented function parabolicExtremum (header npstat/nm/MathUtils.hh).
* Added class DistributionMix1D (header npstat/stat/DistributionMix1D.hh).
* Added interface to solving A*X = B, with matrices X and B, to the Matrix
class (method "solveLinearSystems").
* Added "reshape" methods to the ArrayND class.
* Added "gaussianResponseMatrix" function.
* Added a section on unfolding to the documentation.
* Added "ems_unfold_1d" example program.
Version 3.0.0  March 14 2014, by I. Volobouev
* Added interface to the LAPACK SVD routines.
* Added function "lorpeMise1D" to calculate MISE for arbitrary distributions.
* Added FoldingFilter1DBuilder class.
* Changed interfaces for several highlevel functions to use
FoldingFilter1DBuilder. The major version number got bumped up.
* Split DensityScan1D.hh away from AbsDistribution1D.hh.
Version 2.7.0  March 10 2014, by I. Volobouev
* Added code to optimize operations with diagonal matrices.
* Added discretizedDistance.hh file for simple L1 and L2 distance calculations
with numeric arrays.
* Added base class for future unfolding code.
* The "reset" method of the Matrix class was renamed into "uninitialize"
in order to be consistent with ArrayND.
* Added function "multinomialCovariance1D".
* Added "polyTimesWeight" method to the OrthoPoly1D class.
* Added methods "TtimesThis" and "timesT" to the Matrix class. These
methods are more efficient than transpose followed by multiplication.
Version 2.6.0  January 30 2014, by I. Volobouev
* Added function "lorpeBackgroundCVDensity1D" which linearizes calculation
of the cross validation likelihood in semiparametric fits. Argument
"linearizeCrossValidation" was added to MinuitSemiparametricFitFcn1D
constructor, "lorpeBackground1D" function, etc.
* Added the ability to build filters with center point removed to classes
WeightTableFilter1DBuilder and StretchingFilter1DBuilder. The function
"symbetaLOrPEFilter1D" now has an appropriate switch.
* Added "removeRowAndColumn" method to the Matrix class.
* Added CircularBuffer class.
* Added various plugin bandwidth functions which work with noninteger
polynomial degrees.
* Switched to the Legendre polynomial basis for calculating all 1d
orthogonal polynomials (instead of monomial basis).
* Added MemoizingSymbetaFilterProvider class.
* Added "operator+=" method to the MultivariateSumAccumulator class.
* Simplified implementation of the PolyFilterCollection1D class.
File PolyFilterCollection1D.icc is removed.
* Added "RatioOfNormals" 1d distribution function. Only the density is
currently implemented but not the CDF.
* Added ExpMapper1d class.
Version 2.5.0  October 15 2013, by I. Volobouev
* Added "getFilterMatrix" method to the LocalPolyFilter1D class.
* Added "genEigen" method to the Matrix class (for determination of
eigenvalues and eigenvectors of general real matrices).
* Refactored the LAPACK interface so that interface functions to floats
are automatically generated from interface functions to doubles. See
the comment at the end of the "lapack_interface.icc" file for the shell
commands to do this.
Version 2.4.0  October 6 2013, by I. Volobouev
* Added functions "lorpeBackground1D", "lorpeBgCVPseudoLogli1D", and
"lorpeBgLogli1D".
* Added minuit interface classes "MinuitLOrPEBgCVFcn1D" and
"MinuitSemiparametricFitFcn1D".
* Added "ScalableDensityConstructor1D" class for use with Minuit interface
functions.
* Added classes AbsSymbetaFilterProvider and SymbetaPolyCollection1D.
Version 2.3.0  October 1 2013, by I. Volobouev
* Allowed point dimensionality to be larger than the histogram dimensionality
in the "empiricalCopulaHisto" function.
* Added "keepAllFilters" method to AbsFilter1DBuilder and all derived classes.
* Implemented exclusion regions for WeightTableFilter1DBuilder and
StretchingFilter1DBuilder.
* "symbetaLOrPEFilter1D" function (in the header LocalPolyFilter1D.hh)
is updated to take the exclusion mask argument.
* Added "continuousDegreeTaper" function which can do something meaningful
with the continuous LOrPE degree parameter.
Version 2.2.0  June 30 2013, by I. Volobouev
* Added classes DiscreteBernsteinPoly1D and BernsteinFilter1DBuilder.
* Added classes DiscreteBeta1D and BetaFilter1DBuilder.
* Added BifurcatedGauss1D class to model Gaussianlike distributions with
different sigmas on the left and right sides.
* Added virtual destructors to the classes declared in the Filter1DBuilders.hh
header.
* Added a method to the Matrix template to calculate Frobenius norm.
* Added methods to the Matrix template to calculate row and column sums.
* Added "directSum" method to the Matrix template.
* Added constructor from a subrange of another matrix to the Matrix template.
* Added code to the LocalPolyFilter1D class that generates a doubly
stochastic filter out of an arbitrary filter.
* Added "npstat/nm/definiteIntegrals.hh" header and corresponding .cc file
for various infrequently used integrals.
* Added "betaKernelsBandwidth" function.
Version 2.1.0  June 20 2013, by I. Volobouev
* Fixed couple problems which showed up in the robust regression code
due to compiler update.
* Fixed CensoredQuantileRegressionOnKDTree::process method (needed this>
dereference for some member).
Version 2.0.0  June 15 2013, by I. Volobouev
* Updated to use "Geners" version 1.3.0. A few interfaces were changed
(API for the string archives was removed because Geners own string archive
facilities are now adequate) so the major version number was bumped up.
Version 1.6.0  June 12 2013, by I. Volobouev
* Updated some documentation.
* Updated fitCompositeJohnson.icc to use simplified histogram constructors.
* Bug fix in the "minuitLocalQuantileRegression1D" function.
* Changed the "quantileBinFromCdf" function to use unsigned long type for
array indices.
* Added "weightedLocalQuantileRegression1D" function (namespace npsi) for
local regression with single predictor on weighted points.
Version 1.5.0  May 23 2013, by I. Volobouev
* Added interfaces to LAPACK routines DSYEVD, DSYEVR, and corresponding
single precision versions.
* Added the "symPSDefEffectiveRank" method to the Matrix class for
calculating effective ranks of symmetric positive semidefinite matrices.
* Added converting constructor and assignment operator to the Matrix class.
* Run the GramSchmidt procedure twice when orthogonal polynomials are
derived in order to improve orthogonality.
Version 1.4.0  May 20 2013, by I. Volobouev
* Added the "append" method to the AbsNtuple class.
Version 1.3.0  May 10 2013, by I. Volobouev
* Added the code for Hermite polynomial series.
* Improved random number generator for the 1d Gaussian distribution.
* Added a framework for discrete 1d distributions as well as two
concrete distribution classes (Poisson1D, DiscreteTabulated1D).
* Added functions "readCompressedStringArchiveExt" and
"writeCompressedStringArchiveExt" which can read/write either compressed
or uncompressed string archives, distinguished by file extension.
Version 1.2.1  March 22 2013, by I. Volobouev
* Improved CmdLine.hh in the "examples/C++" directory.
* Added class QuantileTable1D.
* Added classes LeftCensoredDistribution and RightCensoredDistribution.
Version 1.2.0  March 13 2013, by I. Volobouev
* Added convenience "fill" methods to work with the ntuples which have
small number of columns (up to 10).
* Fixed a bug in AbsRandomGenerator for univariate generators making
multivariate points.
* Added LOrPEMarginalSmoother class.
Version 1.1.1  March 11 2013, by I. Volobouev
* Added utility function "symbetaLOrPEFilter1D" which creates 1d LOrPE
filters using kernels from the symmetric beta family (and the Gaussian).
* Added high level driver function "lorpeSmooth1D".
* Allowed variables with zero variances for calculation of correlation
coefficients in "MultivariateSumsqAccumulator". Such variables will
have zero correlation coefficients with all other variables.
* Added rebinning constructor to the HistoND class.
Version 1.1.0  March 8 2013, by I. Volobouev
* Changed NUHistoAxis::fltBinNumber method to produce correct results
with interpolation degree 0. It is not yet obvious which method would
work best for higher interpolation degrees.
* Added functions for converting between StringArchive and python bytearray.
They have been placed in a new header: wrap/stringArchiveToBinary.hh.
* Added methods "exportMemSlice" and "importMemSlice" to ArrayND. These
methods allow for filling array slices from unstructured memory buffers
and for exporting array slices to such memory buffers.
* Added "simpleColumnNames" function (header file AbsNtuple.hh) to generate
trivial column names when ntuple column names are not important.
* Added functions "neymanPearsonWindow1D" and "signalToBgMaximum1D".
They are declared in a new header npstat/neymanPearsonWindow1D.hh.
Version 1.0.5  December 17 2012, by I. Volobouev
* Flush string archives before writing them out in stringArchiveIO.cc.
* Added class TruncatedDistribution1D.
Version 1.0.4  November 14 2012, by I. Volobouev
* Added utilities for reading/writing Geners string archives to files.
* Added BinSummary class.
* Doxygen documentation improved. Every header file in stat, nm, rng,
and interfaces now has a brief description.
Version 1.0.3  September 27 2012, by I. Volobouev
* Fixed some bugs related to moving StorableMultivariateFunctor code
from "nm" to "stat".
Version 1.0.2  August 6 2012, by I. Volobouev
* Added converting copy constructor to the "LinInterpolatedTableND" class.
* Added StorableMultivariateFunctor class (together with the corresponding
reader class).
* Added StorableInterpolationFunctor class which inherits from the above
and can be used with interpolation tables.
* Added StorableHistoNDFunctor class which inherits from
StorableMultivariateFunctor and can be used to interpolate histogram bins.
* Added "transpose" method to HistoND class.
* Created DualAxis class.
* Created DualHistoAxis class.
* Added conversion functions between histogram and grid axes.
* Added "mergeTwoHistos" function for smooth merging of two histograms.
* Added "ProductSymmetricBetaNDCdf" functor to be used as weight in
merging histograms.
* Added CoordinateSelector class.
Version 1.0.1  June 29 2012, by I. Volobouev
* Implemented class LinInterpolatedTableND with related supporting code.