Short Text Classification with Novel Smoothing Kernels Based on Semantic Values of Terms
In literature, traditional classifiers are used in classification studies done on social network sites and semantic relations of words are ignored.
In literature, traditional classifiers are used in classification studies done on social network sites and semantic relations of words are ignored. They ignore the natural behaviors of natural languages like synonym, polysemy, multi-word expressions and latent-semantics.
This project -which is also a TÜBİTAK (Project Type: 3001) project- aims to analyze and classify topic-based Turkish short texts on social network sites with Novel Semantic Smoothing Kernels for Support Vector Machines (SVM) based on Semantic Values of Terms like Term Weighting with Abstract Features, Balinsky’s meaning method, Term Frequency-Relevance Frequency, and Sprinkling.
For this semester, a novel semantic smoothing kernel for Support Vector Machine is implemented and it is tested on English titles of news dataset to demonstrate the effects of using semantic values of terms. In the next semester, we are going to improve our kernel and develop new ones. Also we are going to create a Turkish tweet dataset to use our model for topic-based classification.