Machine Translation (MT) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another. On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus statistical, and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies. Current machine translation software often allows for customization by domain or profession, improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardized text. Improved output quality can also be achieved by human intervention. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is.
Loss Function in mathematical optimization, statistics, decision theory and machine learning is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (sometimes called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized. In statistics, typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data. In the context of economics, for example, this is usually economic cost or regret. In classification, it is the penalty for an incorrect classification of an example. In actuarial science, it is used in an insurance context to model benefits paid over. In optimal control, the loss is the penalty for failing to achieve the desired value. In financial risk management, the function is precisely mapped to a monetary loss.
Long-Tailed Distribution in statistics and business is the portion of the distribution having a large number of occurrences far from the “head” or central part of the distribution. The term is often used loosely, with no definition or arbitrary definition, but precise definitions are possible. Broadly speaking, for such population distributions, the majority of occurrences (more than half, and where the Pareto principle applies, 80%) are accounted for by the first 20% of items in the distribution. What is unusual about a long-tailed distribution is that the most frequently occurring 20% of items represent less than 50% of occurrences; or in other words, the least frequently occurring 80% of items are more important as a proportion of the total population.The long tail concept has found some ground for application, research, and experimentation. It is a term used in online business, mass media, micro-finance, user-driven innovation, and social network mechanisms (e.g. crowdsourcing, crowd casting, peer-to-peer), economic models, and marketing.
Long Short-Term Memory usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is their default behavior. All recurrent neural networks have the form of a chain of repeating modules of a neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer. LSTMs also have this chain like structure, but the repeating module has a different structure. The key to LSTMs is the cell state which is acting like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to flow along it unchanged. The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates. The first step in LSTM is to decide what information it is going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” The next step is to decide what new information we’re going to store in the cell state. The last step is a decision on output.
Log-Normal Distribution in probability theory is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a normal distribution. Likewise, if Y has a normal distribution, then X=exp(y) has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. The distribution is occasionally referred to as the Galton distribution or Galton’s distribution, after Francis Galton.A log-normal process is the statistical realization of the multiplicative product of many independent random variables, each of which is positive. This is justified by considering the central limit theorem in the log domain. The log-normal distribution is the maximum entropy probability distribution for a random variate for which the mean and variance of are specified.
Euclidean distance in mathematics is the “ordinary” (i.e. straight-line) distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space. The associated norm is called the Euclidean norm. Older literature refers to the metric as a Pythagorean metric. A generalized term for the Euclidean norm is the L2 norm or L2 distance. The Euclidean distance between points p and q is the length of the line segment connecting them. In Cartesian coordinates, if p = (p1, p2,…, pn) and q = (q1, q2,…, qn) are two points in Euclidean n-space, then the distance (d) from p to q, or from q to p is given by the Pythagorean formula.
Estimation is the process of finding an estimate, or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is derived from the best information available. Typically, estimation involves “using the value of a statistic derived from a sample to estimate the value of a corresponding population parameter”. The sample provides information that can be projected, through various formal or informal processes, to determine a range most likely to describe the missing information. An estimate that turns out to be incorrect will be an overestimate if the estimate exceeded the actual result or an underestimate if the estimate fell short of the actual result. Estimation is often done by sampling, which is counting a small number of examples and projecting that number onto a larger population. Estimates can similarly be generated by projecting results from polls or surveys onto the entire population. Estimation is important in business and economics because too many variables exist to figure out how large-scale activities will develop.
Eigenvectors are a special set of vectors associated with a linear system of equations that are sometimes also known as characteristic roots, proper values, or latent roots. The determination of the eigenvectors and eigenvalues of a system is extremely important in physics and engineering, where it is equivalent to matrix diagonalization and arises in such common applications as stability analysis, the physics of rotating bodies, and small oscillations of vibrating systems, to name only a few. Each eigenvector is paired with a corresponding so-called eigenvalue. Mathematically, two different kinds of eigenvectors need to be distinguished: left eigenvectors and right eigenvectors.