# Assumptions of Classical Linear Regression Model-CLRM-Based on secondary data

## Detection, Illness, and Removals of Multicollinearity

A linear regression model is predicated on four assumptions:
Linearity: There is a linear relationship between X and the mean of Y.
Homoscedasticity occurs when the variance of the residual is the same for all values of X.
Independence: Observations are distinct from one another.
Normality: Y is normally distributed for any fixed value of X.
2nd assumption of CLRM: None of the independent variables have a linear relationship with any other independent variables.

Above is the 2nd assumption of the Classical Linear Regression Model (CLRM) and we detect it by replacing independent one by one as the dependent variable and run a regression to determine the coefficient of determination R2 and if R2 such model says showing zero value then variance inflation factor would be 1. Formula, VIF =  Therefore, any value greater than 1 shows the problem of VIF and the statistician says, greater than 5 shows the severity and some say greater than or equal to 10 will be considered as the severity of the problem. And in this report, we will follow the greater than or equal 10 VIF as severity. The Illness of the model is that “Inefficiency of the coefficient”

### Removals of Multicollinearity

There are four methods that are used to remove multicollinearity,

1.      If the Variance inflation factor (VIF) is less than 10. Leave the method alone

2.      If VIF is greater than 10 then use the following methods

A.    Exclude the variable (however, only control variable can be excluded)

B.     Change the measure (e.g., use growth rate FDI instead of FDI in \$term)

C.     Increase the sample size

## Detection, Illness, and Removals of Autocorrelation

4th assumption of CLRM: Error term observations are independent of each other OR they are not correlated to each other.

Above is the 4th assumption of the Classical Linear Regression Model (CLRM) and if the error term correlates to its previous values the problem of autocorrelation exists. Error is a mistake and mistakes should be random, no one commits mistakes deliberately or intentionally. Suppose Z was an important variable but due to lack of literature review or without reading/understanding theory, researchers forget to include Z into the model and as result, this omission Z will become part of the error term. In this case when the error term shows positive autocorrelation like +ve, +ve, +ve ……Or -ve, -ve, -ve…... and when shows consistent error value in terms of +ve, -ve, +ve, -ve…...we call it negative autocorrelation. So, we have to look into whether such values of error terms are there.

### Illness

Issues in the significance

### Measure

We use the DW method to determine the autocorrelation problem by using the DW method and the range of DW values is 0 to 4, If the DW value is 2 then there is no autocorrelation.

If DW is less than 2 then there is positive autocorrelation

If DW is greater than 2 then there are negative autocorrelation exits.

### Severity test

We use serial correlation LM test to check the severity of the problem,

The hypothesis is that Ho: there is no autocorrelation, if the p-value is greater than 0.05, Accept Null hypothesis that there is no autocorrelation.

If the p-value is greater than 0.05, Reject the Null hypothesis that there is autocorrelation.

### Removals

1. Addition of relevant variables. Remember it is not necessary that adding one variable will solve the problem, you may need to add 2 or 3 or more than that variable into the model. How do we know the relevant variables? Firstly, we need to know the theory, and secondly need to review the literature.

2. Cochrane Orcutt test

3. AR (1) test

4. HAC test

## Detection, Illness, and Removals of Heteroscedasticity

### 6th assumption of CLRM: Error term has constant variance.

Above is the 6th assumption of the Classical Linear Regression Model (CLRM) and according to it, there should be constant variance among different groups. In this example we have taken data of 31 countries which consist of low-income, medium-income, and high-income countries, it is cross-sectional data, i.e., data pertains to the year 1992 and various entities (31 countries) having respective GDPs and Consumption. The Illness of this model is that when variance is none constant (i.e., heteroskedasticity exists) significance becomes doubtful.

## Understand the Concept of Functional Form of Regression

As we know OLS's first assumption that “model should be linear in parameter” is applicable to parameters not on a variable, e.g., Y = α + β1X +e and Y = α + β1X + β2X2 preceding are linear in variables while following is non-linear in a variable, since it is non-linear in variable but linear in parameters, so OLS still estimate it. However, if the functional form of the regression equation is nonlinear then we need to apply a non-linear form like TO = 6L2 – 0.4L3, TP is total productivity of labor and L is the number of labors used in production.

### Various forms of non-linear Forms of Regression

1.  U shape inverted curve (it has peak) or U shape non-inverted curve (it has trough)

1. If B1 > 0, B2 < 0 {it is U shape inverted curve (it has peak)}

2. If B2 < 0, B1 > 0 {it is U shape non-inverted curve (it has trough)}

2. Exponential Growth like y = B1 XB2 + Ut we can transform it by taking log both sides of the equation. Since the log form equation has already been discussed in the previous report of the Log in Regression Model, therefore, we will concentrate on the U shape inverted curve.

### How do we know what is the functional form of the regression equation?

Firstly, theory guides us to use the non-functional form, like the Laffer curve says that the relationship between Tax-revenue and tax rate is non-functional.

Secondly, there is a test known as the Ramsey RESET test that says if we take a square of estimated “y” as an independent variable and if it shows significant value, then we have to apply a non-linear form of a regression model.

## STATIONARY SERIES AND UNIT ROOT TEST

The concept of stationary and non-stationary series includes the ADF test (Augmented Dickey-Fuller test). Following are typical characteristics of stationary series,

1. Constant mean

2. Constant variance

3. Autocovariance should not depend on the time

4. The following three methods are used to detect whether series is stationary or non-stationary

5. Graphical method

6. Autocorrelation Function (ACF)

7. DF or Dickey and Fuller test

## Autoregressive Distributed lag model (ARDL) and Granger Causality with EViews

There are two techniques through which we understand the relationship between or among the variables namely,

1. Interdependence techniques (like Covariance & Correlation)

2. Dependence techniques (Like regression in which we know which is dependent on whom, causation is also a dependence technique). If we regress some values with its lag values and the effect which will arise due to such lag values is considered as causation, i.eIn such case dependent variable will be an effect of some cause. If we have a model in which independent variables and dependent variables are the same but independent variables are coming with lag values and if they regress with each other, then we call it autoregressive. The Autoregressive Distributed lag model (ARDL) is used quite often by researchers to understand the causality and its effect on the dependent variable. The test to determine causality in EViews is performed by Granger causality.

## Understanding the Dummy Variables

In this example data are taken from France daily average salary area wise and gender-wise at different age levels, we have 222 observations, where Salary is a dependent variable and Area and Age are independent variables According to this data there are two scenarios,

1.  How much age and area cause change in salary and irrespective of gender?

2. How much age and area put an impact on salary by considering gender as a dummy variable– Male and female?

### Use of Log in Regression Model

Basically, the mathematical function of the log is used for two reasons,

1. To normalized data, if an outlier exists in data
1. To covert observations into percentages
Multicollinearity and Autocorrelation { } Heteroscedasticity { } Functional Form of Regression { } Functional Form of Regression { } STATIONARY SERIES AND UNIT ROOT { } Distributed lag model (ARDL) and Granger { } Understanding the Dummy Variables { } LOG FORM { }