INGEOTEC is a research group formed by researchers of two research centers: CentroGEO and INFOTEC; and Cátedras CONACYT.
INGEOTEC research interest is text categorization seen as a supervised learning problem, that is, as a classification task. In this problem, we have developed two text modeling techniques that represent the text in a vector space model and use a Support Vector Machine as a classifier. These techniques are B4MSA which is a sentiment analysis classifier and microTC a general text classifier. In addition this, we have been working on novel classifiers based on Genetic Programming EvoDAG.
In sentiment analysis, author profiling and text-image matching problem, we have participated in a number of competitions such as:
In order to facilitate and encourage the reproducibility of our research, we have decided to make the software available with an open-source license and available through https://github.com/INGEOTEC. We have decided to implement our developments in Python following some continuous integration techniques (using travis-ci.org), unit testing (using nose), and coverage (using Coveralls).
B4MSA is a Python Sentiment Analysis Classifier for Twitter-like short texts. It can be used to create a first approximation to a sentiment classifier on any given language. It is almost language-independent, but it can take advantage of the particularities of a language.
It is written in Python making use of NTLK and scikit-learn to create compelling but straightforward sentiment classifiers.
microTC follows a minimalistic approach to text classification. It is designed to tackle text-classification problems in an agnostic way, being both domain and language independent. Currently, we only produce single-label classifiers; but support for multi-labeled problems is in the roadmap.
microTC is intentionally simple, so only a small number of features where implemented. However, it uses a some complex tools from numpy and scikit-learn.
EvoMSA is a Sentiment Analysis System based on B4MSA and EvoDAG. EvoMSA is a stack generalization algorithm specialized on text classification problems. It works by combining the output of different text models to produce the final prediction.
Evolving Directed Acyclic Graph (EvoDAG) is a steady-state Genetic Programming system with tournament selection. The main characteristic of EvoDAG is that the genetic operation is performed at the root. EvoDAG was inspired by the geometric semantic crossover proposed by Alberto Moraglio et al. and the implementation performed by Leonardo Vanneschi et al.
EvoDAG is described in the following conference paper EvoDAG: A semantic Genetic Programming Python library Mario Graff, Eric S. Tellez, Sabino Miranda-Jiménez, Hugo Jair Escalante. 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) pp 1-6. A pre-print version can be download from here.
The lectures below have been given at the Master degree program in Data Science at INFOTEC Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación. The lectures Procesamiento de Información and Aprendizaje Computacional are given at the second semester. Clasificación de Texto corresponds to the third semester.
117 Circuito Tecnopolo Norte Col. Tecnopolo Pocitos II, C.P. 20313, Aguascalientes, Ags, México.
Tel. +52 (449) 994 51 50 Ext. 5251 and 5230
112 Circuito Tecnopolo Norte Col. Tecnopolo Pocitos II, C.P. 20313, Aguascalientes, Ags, México.
Tel. +52 (555) 624 28 00 Ext. 6315, 6353, 6313 and 6384