What is Explanatory Data Analysis?

Explanatory Data Analysis (EDA) in statistics is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA. Exploratory data analysis, robust statistics, nonparametric statistics, and the development of statistical programming languages facilitated statisticians’ work on scientific and engineering problems. There are a number of tools that are useful for EDA, but EDA is characterized more by the attitude than by particular techniques. Typical graphical techniques used in EDA are: Box plot, Histogram, Multi-vari chart, Run chart, Pareto chart, Scatter plot, Stem-and-leaf plot, Parallel coordinates, etc.

What is Euclidean Distance?

Euclidean distance in mathematics is the “ordinary” (i.e. straight-line) distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space. The associated norm is called the Euclidean norm. Older literature refers to the metric as a Pythagorean metric. A generalized term for the Euclidean norm is the L2 norm or L2 distance. The Euclidean distance between points p and q is the length of the line segment connecting them. In Cartesian coordinates, if p = (p1, p2,…, pn) and q = (q1, q2,…, qn) are two points in Euclidean n-space, then the distance (d) from p to q, or from q to p is given by the Pythagorean formula.

What is Estimation?

Estimation is the process of finding an estimate, or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is derived from the best information available. Typically, estimation involves “using the value of a statistic derived from a sample to estimate the value of a corresponding population parameter”. The sample provides information that can be projected, through various formal or informal processes, to determine a range most likely to describe the missing information. An estimate that turns out to be incorrect will be an overestimate if the estimate exceeded the actual result or an underestimate if the estimate fell short of the actual result. Estimation is often done by sampling, which is counting a small number of examples and projecting that number onto a larger population. Estimates can similarly be generated by projecting results from polls or surveys onto the entire population. Estimation is important in business and economics because too many variables exist to figure out how large-scale activities will develop.

What is Eigenvectors?

Eigenvectors are a special set of vectors associated with a linear system of equations that are sometimes also known as characteristic roots, proper values, or latent roots. The determination of the eigenvectors and eigenvalues of a system is extremely important in physics and engineering, where it is equivalent to matrix diagonalization and arises in such common applications as stability analysis, the physics of rotating bodies, and small oscillations of vibrating systems, to name only a few. Each eigenvector is paired with a corresponding so-called eigenvalue. Mathematically, two different kinds of eigenvectors need to be distinguished: left eigenvectors and right eigenvectors.