Predictive Modeling is a process through which a future outcome or behavior is predicted based on the past and current data at hand. It is a statistical analysis technique that enables the evaluation and calculation of the probability of certain results. Predictive modeling works by collecting data, creating a statistical model and applying probabilistic techniques to predict the likely outcome.

# Tag: data science glossary

## What is Precision?

Precision looks at the ratio of correct positive observations.

The formula is True Positives / (True Positives + False Positives).

Note that the denominator is the count of all positive predictions, including positive observations of events which were, in fact, negative.

## What is Power Analysis?

Power Analysis is an important aspect of experimental design. It allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence.

There are four parameters involved in a power analysis. The research must ‘know’ 3 and solve

for the 4th.

1. Alpha:

Probability of finding significance where there is none

False positive

Probability of a Type I error

Usually set to.05

2. Power

Probability of finding true significance

True positive

1 – beta, where beta is:

Probability of not finding significance when it is there

False negative

Probability of a Type II error

Usually set to.80

3. N:

The sample size (usually the parameter you are solving for)

May be known and fixed due to study constraints

4. Effect size:

Usually, the ‘expected effect’ is ascertained from:

Pilot study results

Published findings from a similar study or studies

May need to be calculated from results if not reported

May need to be translated as design specific using rules of thumb

Field defined ‘meaningful effect’

Educated guess (based on informal observations and knowledge of the

field)

## What is Paired t-Test?

Paired t-Test has its purpose in the testing is to determine whether there is statistical evidence that the mean difference between paired observations on a particular outcome is significantly different from zero. The Paired-Samples t Test is a parametric test. This test is also known as Dependent t-Test.

## What is Out-Of-Sample Evaluation?

Out-Of-Sample Evaluation means to withhold some of the sample data from the model identification and estimation process, then use the model to make predictions for the hold-out data in order to see how accurate they are and to determine whether the statistics of their errors are similar to those that the model made within the sample of data that was fitted.

## What is Multinomial Logistic Regression?

Multinomial Logistic Regression is the linear regression analysis to conduct when the dependent variable is nominal with more than two levels. Thus it is an extension of logistic regression, which analyzes dichotomous (binary) dependents. Since the output of the analysis is somewhat different to the logistic regression’s output, multinomial regression is sometimes used instead. Like all linear regressions, the multinomial regression is a predictive analysis. Multinomial regression is used to describe data and to explain the relationship between one dependent nominal variable and one or more continuous-level(interval or ratio scale) independent variables.

## What is Model Fitting ?

Model Fitting is running an algorithm to learn the relationship between predictors and outcome so that you can predict the future values of the outcome.

It proceeds in three steps:

First, you need a function that takes in a set of parameters and returns a predicted data set.

Second you need an ‘error function’ that provides a number representing the difference between your data and the model’s prediction for any given set of model parameters.

Third, you need to find the parameters that minimize this difference. Once you set things up properly, this third step is easy.

## What is Markov Model?

Markov Model in probability theory is a stochastic model used to model randomly changing systems where it is assumed that future states depend only on the current state not on the events that occurred before it (defined as the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modeling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property. There are four most common Markov models used in different situations, depending on whether every sequential state is observable or not, and whether the system is to be adjusted on the basis of observations made.

These are Markov chain (the simplest model), Hidden Markov Model (Markov chain with only part of states observable), Markov decision process (chain with applied action vector) and Hidden Markov decision process. There is also a Markov random field, or Markov network may be considered to be a generalization of a Markov chain in multiple dimensions, and Hierarchical Markov Models which can be applied to categorize human behavior at various levels of abstraction.

## What is Manhattan Distance?

Manhattan Distance is the distance between two points measured along axes at right angles. The name hints to the grid layout of the streets of Manhattan, which causes the shortest path a car could take between two points in the city. The limitation of the Manhattan Distance heuristic is that it considers each tile independently, while in fact, tiles interfere with each other.

## What is MAE (Mean Absolute Error)?

MAE – Mean Absolute Error in statistics is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.The mean absolute error is an average of the absolute error where is the prediction and the true value. Note that alternative formulations may include relative frequencies as weight factors. The mean absolute error used the same scale as the data being measured. This is known as a scale-dependent accuracy measure and therefore cannot be used to make comparisons between series using different scales. The mean absolute error is a common measure of forecast error in time series analysis, where the terms “mean absolute deviation” is sometimes used in confusion with the more standard definition of mean absolute deviation. The same confusion exists more generally.