he298c
2021-02-01
Answered

What is the correlation coefficient and what is its significance? Explain why correlation coefficient is bound between -1 and 1. How correlation coefficient depicts itself in scatterplots? Aside from probabilistic models, explain what is least squares line fitting.

You can still ask an expert for help

Demi-Leigh Barrera

Answered 2021-02-02
Author has **97** answers

Step 1

Correlation:

Correlation a measure which indicates the “go-togetherness” of two data sets. It can be denoted as r. The value of correlation coefficient lies between –1 and +1. The positive 1 indicates that the two data sets are perfect and both are in same direction. The negative 1 indicates that the two data sets are perfect and both are in opposite direction. It will be zero when there is no relationship between the two data sets.

Correlation coefficient, r:

The Karl Pearson’s product-moment correlation coefficient or simply, the Pearson’s correlation coefficient is a measure of the strength of a linear association between two variables and is denoted by r or$r}_{xy$ .

The coefficient of correlation$r}_{xy$ between two variables x and y for the bivariate data set $({x}_{i},{y}_{i})f{\textstyle \phantom{\rule{1em}{0ex}}}\text{or}{\textstyle \phantom{\rule{1em}{0ex}}}i=1,2,3\dots N$ is given below:

$r}_{xy}=\frac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n(\sum {x}^{2})-(\sum {x}^{2})]\times [n(\sum {y}^{2})-(\sum {y}^{2})]}$

Step 2

Scatterplot and correlation:

A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the data set gets plotted as a point whose (x, y) coordinates relates to its values for the two variables.

When the y variable tends to increase as the x variable increases, it can be said that there is a positive correlation between the variables. In other words, when the points on the scatterplot produce a lower left to upper right pattern, there is a positive correlation between the variables.

When the y variable tends to decrease as the x variable increases, it can be said that there is a negative correlation between the variables. In other words, when the points on the scatterplot produce an upper left to lower right pattern, there is a positive correlation between the variables.

When all the points on a scatterplot lie on a straight line, it can be said that there is a perfect correlation between the two variables.

A scatterplot in which the points do not have a linear trend (either positive or negative) is called a zero correlation or a near-zero correlation.

Form of the association between variables:

The form of the association describes whether the data points follow a linear pattern or some other complicated curves. For data if it appears that a line would do a reasonable job of summarizing the overall pattern in the data. Then, the association between two variables is linear.

Direction of association:

If the increase in the values of one variable increases the values of another variable, then the direction is positive. If the increase in the values of one variable decreases the values of another variable, then the direction is negative.

Strength of the association:

The association is said to be strong if all the points are close to the straight line. It is said to be weak if all points are far away from the straight line and it is said to be moderate if the data points are moderately close to straight line.

Step 3

Least squares line fitting:

Regression analysis estimates the relationship among variables. That is, it estimates the relationship between one dependent variable and one or more independent variables.

The general form of first-order regression model is$y-\cap ={\beta}_{0}+{\beta}_{1}x+\u03f5$ , Where, the variable y is the dependent variable that is to be modelled or predicted, the variable x is the independent variable that is used to predict the dependent variable, and ε is the error term.

The difference between of the observed value of y and predicted value of value of y is called as residual. Hence, the value of residual is represented as$y\u2013(y-\cap )$ .

If the sum of the squares of the residuals is expressed as smallest sum possible, then the straight line satisfies the least squares property. The regression line of the straight line satisfies the least-squares property then that "best fits the points in a scatterplot.

Correlation:

Correlation a measure which indicates the “go-togetherness” of two data sets. It can be denoted as r. The value of correlation coefficient lies between –1 and +1. The positive 1 indicates that the two data sets are perfect and both are in same direction. The negative 1 indicates that the two data sets are perfect and both are in opposite direction. It will be zero when there is no relationship between the two data sets.

Correlation coefficient, r:

The Karl Pearson’s product-moment correlation coefficient or simply, the Pearson’s correlation coefficient is a measure of the strength of a linear association between two variables and is denoted by r or

The coefficient of correlation

Step 2

Scatterplot and correlation:

A scatterplot is a type of data display that shows the relationship between two numerical variables. Each member of the data set gets plotted as a point whose (x, y) coordinates relates to its values for the two variables.

When the y variable tends to increase as the x variable increases, it can be said that there is a positive correlation between the variables. In other words, when the points on the scatterplot produce a lower left to upper right pattern, there is a positive correlation between the variables.

When the y variable tends to decrease as the x variable increases, it can be said that there is a negative correlation between the variables. In other words, when the points on the scatterplot produce an upper left to lower right pattern, there is a positive correlation between the variables.

When all the points on a scatterplot lie on a straight line, it can be said that there is a perfect correlation between the two variables.

A scatterplot in which the points do not have a linear trend (either positive or negative) is called a zero correlation or a near-zero correlation.

Form of the association between variables:

The form of the association describes whether the data points follow a linear pattern or some other complicated curves. For data if it appears that a line would do a reasonable job of summarizing the overall pattern in the data. Then, the association between two variables is linear.

Direction of association:

If the increase in the values of one variable increases the values of another variable, then the direction is positive. If the increase in the values of one variable decreases the values of another variable, then the direction is negative.

Strength of the association:

The association is said to be strong if all the points are close to the straight line. It is said to be weak if all points are far away from the straight line and it is said to be moderate if the data points are moderately close to straight line.

Step 3

Least squares line fitting:

Regression analysis estimates the relationship among variables. That is, it estimates the relationship between one dependent variable and one or more independent variables.

The general form of first-order regression model is

The difference between of the observed value of y and predicted value of value of y is called as residual. Hence, the value of residual is represented as

If the sum of the squares of the residuals is expressed as smallest sum possible, then the straight line satisfies the least squares property. The regression line of the straight line satisfies the least-squares property then that "best fits the points in a scatterplot.

asked 2021-06-23

You were asked about advantages of using box plots and dot plots to describe and compare distributions of scores. Do you think the advantages you found would exist not only for these data, but for numerical data in general? Explain.

asked 2021-03-06

Find the regression line using the given points.

asked 2021-06-02

The two-way table summarizes data from an experiment comparing the effectiveness of three different diets (A, B, and C) on weight loss. Researchers randomly assigned 300 volunteer subjects to the three diets. The response variable was whether each subject lost weight over a 1-year period.

Suppose we randomly select one of the subjects from the experiment. Show that the events "Diet B" and "Lost weight" are independent.

asked 2021-01-04

Find the covariance $s}_{xy$ using bivariate data.

asked 2020-11-30

In bivariate data, we sometimes notice that one of the quantities increases (1, 2, 3...) while the other quantity decreases (20, 19, 18...). Which phrase best describes this association? would it be no correlation, a perfect correlation,a positive correlation or a negative correlation?

asked 2021-06-03

Suppose you were to collect data for each pair of variables. You want to make a scatterplot. Which variable would you use as the explanatory variable and which as the response variable? Why? What would you expect to see in the scatterplot? Discuss the likely direction, form, and strength. Gasoline: number of miles you drove since filling up, gallons remaining in your tank

asked 2020-10-23

Differentiate between univariate, bivariate and multivariate data.