PCA explained with a narrative

PCA explained with a narrative

February 15, 2019 DATAcated Challenge 0

Continued criticisms from liberals of Essos, compelled Robert Baratheon of King’s Landing to summon the education ministry to discuss an important aspect of matriculation exam in Westeros. The objective of the meeting was to reduce the number of subjects in matriculation exam. The education system tests students on their ability on subjects, “English Literature“, “English Grammar“, “Valyrian“, “History“, “Political Science“, “Geography“,  “Physics“, “Chemistry“, “Biology” and “Mathematics“. Examination of students in 10 academic subjects at such young age, was suddenly a cause of worry. Robert asks the ministry to find a way to reduce the examination subjects to 5, instead of 10.

For analysis, Robert gave them access to scores of all the previous students who appeared for Matriculation.

Principal Component Analysis (PCA) is a dimension-reduction tool that is used to reduce a large set of variables to a small set.

Petyr Baelish suggests, the students be given the choice to opt for any 5 examination of their choice. Everyone disapproves his opinion. Cause each of the subject has importance of its own.

PCA does not drop any existing feature to reduce dimension. So, how does PCA reduce dimension?

Tyrion comes up with an idea. He proposes that instead of dropping few subjects, why not create few new subjects that would test student’s knowledge on multiple (co) related subjects. For example, create a new subject called “English”, that would be used to judge student’s knowledge on both “English Literature” and “English Grammar”. So, instead of giving two separate exams on “English Literature” and “English Grammar”, student will have to appear only for a single exam, called “English”. Similarly, for correlated subjects “Physics”, “Chemistry” and “biology”, he came up with a new subject called “Science”. Also, for correlated subjects “History”, “Political Science” and “Geography”, he came up with a new subject called “Social Science”.

He used the term Principal component to describe these new subjects – English, Science and Social Science.

PCA creates new features (called Principal Component), which summarises the data of more than one correlated features. The old features can then be disposed. Also, the data is summarised in a way that the values of old features can be derived back from the principal components.

But what technique should be used to summarise two or more features into one?

Tyrion took a piece of paper and pen, drew a scatter plot of the marks obtained by student on subjects “English Literature” and “English Grammar”. Then he drew a linear line that best represents the scattered points. His idea is that the new subject “English”, will be the linear combination (which is represented by the linear line) of subjects “English Literature” and “English Grammar”.

e.g., English = 0.4 5* (English Literature) + 0.55 * English Grammar

=> English exam will comprise of 45 marks of English Literature and 55 marks of English Grammar.

For Science and Social Science, he drew a three dimensional axis, Physics, Chemistry and Biology. He drew scatter plots of marks obtained by students on these subjects. Then he drew a hyperplane that best represents the scatter points.

e.g., Science = 0.40 * Physics + 0.30 * Chemistry + 0.30 * Biology

=> Science would comprise of 40 marks of Physics, 30 marks of Chemistry and 30 marks of Biology.

Similarly, Social Studies would comprise of 40 marks of Political Science, 30 marks of History and 30 marks of Geography.

Robert was impressed with Tyrion’s analytic skills. And that’s how the 10 subjects matriculation exam in Westeros got reduced to just 5 un(cor)related subjects.

Recapitulating, Principal component analysis (PCA) is a statistical procedure to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

By: Bishwaraj Dey


Leave a Reply

Your email address will not be published. Required fields are marked *