PCA document reconstruction for email classification

Publication year: 2011 Source: Computational Statistics & Data Analysis, Available online 1 October 2011 J.C. Gomez, M.-F. Moens This paper presents a document classifier based on text content features and its application to email classification. We test the validity of a classifier which uses Principal Component Analysis Document Reconstruction (PCADR), where the idea is that principal component analysis (PCA) can compress optimally only the kind of documents – in our experiments email classes – that are used to compute the principal components (PCs), and that for other kinds of documents the compression will not perform well using only a few components. Thus, the classifier computes separately the PCA for each document class, and when a new instance arrives to be classified, this new example is projected in each set of computed PCs corresponding to each class, and then is reconstructed using the same PCs.

More:
PCA document reconstruction for email classification

Previous post:

Next post: