Fully discuss whether we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage, and why.

Fully discuss whether we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage, and why.

Question
Modeling
asked 2020-12-30
Fully discuss whether we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage, and why.

Answers (1)

2020-12-31

Step 1 In simple linear regression method we find the relationship between two variables that is dependent variable and independent variable by using scatter diagram is the graphical method to check the relation between two variables. The simple linear regression equation is given by , \(Y = \beta_0 + \beta_1X\) Where Y is dependent variable X is inependent variable \(\beta_0\) is intercept of regression line \(\beta_1\) is the slope of the regression line In machine learning we called dependent variable (Y-variable) as target variable and independent (or Predictor)variable(X-variable) as feature vector.

Step 2 Yes ,we should omit a predictor from the modeling stage if it does not reflects any connection with the target variable in the EDA stage. Exploratory data analysis is the method of analyzing data sets to summarize their main characteristics within data visualization .This method was discovered by John Tukey. This step is important before you starting the machine learning or modelling of your data. In exploratory data analysis many graphical methods are available to check the relationship between two variables for example scatter plot, multi-vari chart, run chart ,pareto chart using the techniques we check the relationship between two variables if it does not show any relationship then we omit that predictor variable . Model specification is the method to determine which independent variables are included and excluded from the regression equation. Sometimes investigator measure too many variables but include some of them only and omit the variable that does not show any relationship with dependent or target variable .If investigator omits important variable from model the estimates for the variables that included can be biased and this is known as omitted variable bias . and it increase the bias in our model. To avoid bias in regression we omit variable that does not show any reflect connection with the target variable.

0

Relevant Questions

asked 2021-04-25
A wagon with two boxes of Gold, having total mass 300 kg, is cutloose from the hoses by an outlaw when the wagon is at rest 50m upa 6.0 degree slope. The outlaw plans to have the wagon roll downthe slope and across the level ground, and then fall into thecanyon where his confederates wait. But in a tree 40m from thecanyon edge wait the Lone Ranger (mass 75.0kg) and Tonto (mass60.0kg). They drop vertically into the wagon as it passes beneaththem. a) if they require 5.0 s to grab the gold and jump out, willthey make it before the wagon goes over the edge? b) When the twoheroes drop into the wagon, is the kinetic energy of the system ofthe heroes plus the wagon conserved? If not, does it increase ordecrease and by how much?
asked 2021-04-13
As depicted in the applet, Albertine finds herself in a very odd contraption. She sits in a reclining chair, in front of a large, compressed spring. The spring is compressed 5.00 m from its equilibrium position, and a glass sits 19.8m from her outstretched foot.
a)Assuming that Albertine's mass is 60.0kg , what is \(\displaystyle\mu_{{k}}\), the coefficient of kinetic friction between the chair and the waxed floor? Use \(\displaystyle{g}={9.80}\frac{{m}}{{s}^{{2}}}\) for the magnitude of the acceleration due to gravity. Assume that the value of k found in Part A has three significant figures. Note that if you did not assume that k has three significant figures, it would be impossible to get three significant figures for \(\displaystyle\mu_{{k}}\), since the length scale along the bottom of the applet does not allow you to measure distances to that accuracy with different values of k.
asked 2021-06-03
Determine whether the following function is a polynomial function. If the function is a polynomial​ function, state its degree. If it is​ not, tell why not. Write the polynomial in standard form. Then identify the leading term and the constant term.
\(g(x)=3-\frac{x^{2}}{4}\)
asked 2021-06-04
Determine whether \(F(x)=5x^{4}-\pi x^{3}+\frac{1}{2}\) is a polynomial. If it is, state its degree. If not, say why it is not a polynomial. If it is a polynomial, write it in standard form. Identify the leading term and the constant term.
asked 2021-05-01
Determine whether \(g(x)=\frac{x^{3}}{2} -x^{2}+2\) is a polynomial. If it is, state its degree. If not, say why it is not a polynomial. If it is a polynomial, write it in standard form. Identify the leading term and the constant term.
asked 2021-02-25
Give a full and correct answer Why is it important that a sample be random and representative when conducting hypothesis testing? Representative Sample vs. Random Sample: An Overview Economists and researchers seek to reduce sampling bias to near negligible levels when employing statistical analysis. Three basic characteristics in a sample reduce the chances of sampling bias and allow economists to make more confident inferences about a general population from the results obtained from the sample analysis or study: * Such samples must be representative of the chosen population studied. * They must be randomly chosen, meaning that each member of the larger population has an equal chance of being chosen. * They must be large enough so as not to skew the results. The optimal size of the sample group depends on the precise degree of confidence required for making an inference. Representative sampling and random sampling are two techniques used to help ensure data is free of bias. These sampling techniques are not mutually exclusive and, in fact, they are often used in tandem to reduce the degree of sampling error in an analysis and allow for greater confidence in making statistical inferences from the sample in regard to the larger group. Representative Sample A representative sample is a group or set chosen from a larger statistical population or group of factors or instances that adequately replicates the larger group according to whatever characteristic or quality is under study. A representative sample parallels key variables and characteristics of the large society under examination. Some examples include sex, age, education level, socioeconomic status (SES), or marital status. A larger sample size reduced sampling error and increases the likelihood that the sample accurately reflects the target population. Random Sample A random sample is a group or set chosen from a larger population or group of factors of instances in a random manner that allows for each member of the larger group to have an equal chance of being chosen. A random sample is meant to be an unbiased representation of the larger population. It is considered a fair way to select a sample from a larger population since every member of the population has an equal chance of getting selected. Special Considerations: People collecting samples need to ensure that bias is minimized. Representative sampling is one of the key methods of achieving this because such samples replicate as closely as possible elements of the larger population under study. This alone, however, is not enough to make the sampling bias negligible. Combining the random sampling technique with the representative sampling method reduces bias further because no specific member of the representative population has a greater chance of selection into the sample than any other. Summarize this article in 250 words.
asked 2021-05-20
Assume that a ball of charged particles has a uniformly distributednegative charge density except for a narrow radial tunnel throughits center, from the surface on one side to the surface on the opposite side. Also assume that we can position a proton any where along the tunnel or outside the ball. Let \(\displaystyle{F}_{{R}}\) be the magnitude of the electrostatic force on the proton when it islocated at the ball's surface, at radius R. As a multiple ofR, how far from the surface is there a point where the forcemagnitude is 0.44FR if we move the proton(a) away from the ball and (b) into the tunnel?
asked 2021-05-06
The coefficient of linear expansion of copper is 17 x 10-6 K-1. A sheet of copper has a round hole with a radius of 3.0 m cut out of it. If the sheet is heated and undergoes a change in temperature of 80 K, what is the change in the radius of the hole? It decreases by 4.1 mm. It increases by 4.1 mm. It decreases by 8.2 mm. It increases by 8.2 mm. It does not change.
asked 2020-11-06
Determine whether the statement makes sense or does not make sense, and explain your reasoning : By modeling attitudes of college freshmen from 1969 through 2013, I can make precise predictions about the attitudes of the freshman class of 2040.
asked 2021-02-25
We will now add support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be restricted to register indirect (i.e., all addresses are simply a value held in a register; no offset or displacement may be added to the register value). For example, the register-memory instruction add x4, x5, (x1) means add the contents of register x5 to the contents of the memory location with address equal to the value in register x1 and put the sum in register x4. Register-register ALU operations are unchanged. The following items apply to the integer RISC pipeline:
a. List a rearranged order of the five traditional stages of the RISC pipeline that will support register-memory operations implemented exclusively by register indirect addressing.
b. Describe what new forwarding paths are needed for the rearranged pipeline by stating the source, destination, and information transferred on each needed new path.
c. For the reordered stages of the RISC pipeline, what new data hazards are created by this addressing mode? Give an instruction sequence illustrating each new hazard.
d. List all of the ways that the RISC pipeline with register-memory ALU operations can have a different instruction count for a given program than the original RISC pipeline. Give a pair of specific instruction sequences, one for the original pipeline and one for the rearranged pipeline, to illustrate each way.
Hint for (d): Give a pair of instruction sequences where the RISC pipeline has “more” instructions than the reg-mem architecture. Also give a pair of instruction sequences where the RISC pipeline has “fewer” instructions than the reg-mem architecture.
...