Covariance Matrix
Introduction-:
In mathematics and statistics, covariance may be a measure of the connection between two random variables. The metric evaluates what proportion — to what extent — the variables change together. In other words, it’s essentially a measure of the variance between two variables. However, the metric doesn’t assess the dependency between variables.
Before watching Covariance it’s highly important for us that we grasp the concept of Variance. Variance measures the variation of one variable (like the age of a student during a school), whereas covariance may be a measure of how different two random variables are from one another just like the age of a student and therefore the weight of a student during a school). Contrasting the coefficient of correlation, covariance is measured in units. The units are calculated by multiplying the units of the 2 variables. The variance can assume positive also as negative values. The values are interpreted as follows:
Positive covariance: Specifies that two variables have a bent to manoeuvre within the similar direction.
Negative covariance: shows that two variables have a bent to manoeuvre in contrary directions
Formula of variance is-:
where
n-:is the number of samples (e.g. the number of people) and
x-:is the mean of the random variable
Formula of co variance-:
Covariance matrix-:
With the covariance we will calculate entries of the covariance matrix, which may be a matrix given by
[ Ci,j=σ(xi,xj)Ci,j=σ(xi,xj)]
where [C∈Rd×dC∈Rd×d] and dd describes the dimension or number of random variables of the info (e.g. the amount of features like height, width, weight, …).
Also the covariance matrix is symmetrical since σ(xi,xj)=σ(xj,xi)σ(xi,xj)=σ(xj,xi).
The diagonal entries within the covariance matrix are the variances and therefore the other entries which are off diagonal are called the covariances. For this reason, the covariance matrix is usually called the variance-covariance matrix. The calculation for the covariance matrix are often also expressed as
C = 1/n−1n∑ (Xi−¯X)(Xi−¯X)
i=1
where our data set is expressed by the matrix X∈Rn×dX∈Rn×d. Following from this equation, the covariance matrix are often computed for a knowledge set with zero mean with
by using C=XXT/n−1 the semi-definite matrix. In this article, we’ll specialise in the two-dimensional case, but it are often easily generalized to more dimensional data.
We want to point out how linear transformations affect the info set and in result the covariance matrix. First, we’ll generate random points with mean values ¯xx¯, ¯yy¯ at the origin and unit variance σx2=σy2=1 which is additionally called noise and has the unit matrix because the covariance matrix.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use(‘ggplot’)
plt.rcParams[‘figure.figsize’] = (12, 8)
# Normal distributed x and y vector with mean 0 and standard deviation 1
x = np.random.normal(0, 1, 500)
y = np.random.normal(0, 1, 500)
X = np.vstack((x, y)).T
plt.scatter(X[:, 0], X[:, 1])
plt.title(‘Generated Data’)
plt.axis(‘equal’);
This case would mean that xx and yy are independent (or uncorrelated) and the covariance matrix CC is
C=(σ2x00σ2y)C=(σx200σy2)
We can check this by calculating the covariance matrix# Covariancedef cov(x, y):xbar, ybar = x.mean(), y.mean()return np.sum((x - xbar)*(y - ybar))/(len(x) - 1)# Covariance matrixdef cov_mat(X):return np.array([[cov(X[0], X[0]), cov(X[0], X[1])], \[cov(X[1], X[0]), cov(X[1], X[1])]])# Calculate covariance matrixcov_mat(X.T) # (or with np.cov(X.T))array ([[ 1.008072 , -0.01495206],[-0.01495206, 0.92558318]])
Which approximately gives us our expected covariance matrix with variances σx2=σy2=1
Covariance vs. Correlation
Covariance and correlation both primarily assess the connection between variables. The closest analogy to the connection between them is that the relationship between the variance and variance. Covariance measures the entire variation of two random variables from their expected values. Using covariance, we will only gauge the direction of the connection (whether the variables tend to maneuver in tandem or show an inverse relationship). However, it doesn’t indicate the strength of the connection, nor the dependency between the variables.On the opposite hand, correlation measures the strength of the connection between variables. Correlation is that the scaled measure of covariance. it’s dimensionless. In other words, the coefficient of correlation is usually a pure value and not measured in any units.
Example of Covariance
John is experienced investor. His portfolio principally tracks the performance of the S&P 500 and John wants to feature the stock of AB. Before adding the stock to his portfolio, he wants to assess the directional relationship between the stock and therefore the S&P 500. John doesn’t want to extend the unsystematic risk of his portfolio. Thus, he’s not curious about owning securities within the portfolio that tend to manoeuvre within the same direction. John can calculate the covariance between the stock of ABC Corp. and S&P 500 by following the steps below-:
1. Obtain the info.
First, John obtains the figures for both ABC Corp. stock and therefore the S&P 500. the costs obtained are summarized within the table below:
2 Calculating the averages of the price
3. Finding the difference between the value and mean price of each security
4. Multiply the results obtained in the previous step.
5. Using the number calculated in step 4, find the covariance.
Conclusion-: Throughout the article we have tried to understand the basic concepts of covariance and progressed onto dive into the depths of this topic. Understanding the difference between correlation and covariance helps us grasp the topic better and the formulas and derivation of these concepts helps us to understand them well from a statistical point of view. In the example web learn how to calculate covariance and also understand the covariance matrix at a deeper level.