Boun NLP: A Morphological Analysis System for Turkish

Morphological analysis is a very important sub-task of natural language processing.


poster
Poster

Morphological analysis is a very important sub-task of natural language processing. It is used for tokenization, stemming, lemmatization and normalization. For the NLP task in which machine learning approach plays a crucial role, pre-processing the data is vital and the success rate is highly dependent on the pre-processing methodologies. This project proposes a tool for morphologically analyzing words in Turkish which is an agglutinative language in order to constitute a baseline for the further NLP projects.

A combination of rule-based and machine learning methods are utilized for the project. Data is gathered from TDK dictionary and around 22000 Turkish roots are generated. The algorithm exploits the dictionary and list of all suffixes of Turkish and proposes every possible parse. A finite state machine is implemented for the filtering phase to simulate the rules of Turkish language and FSM filters out the non-obeying parses.