Dr. Mario Graff

Mario Graff is a Researcher of the National Council of Science and Technology (CONACYT ) of Mexico commission to INFOTEC, where he researches in the fields of Machine Learning, Evolutionary Computation (EC), and Natural Language Processing (NLP). The research deals with the application of EC, particularly Genetic Programming (GP), to supervised learning problems. In this research avenue, special attention has been put on the use of GP to solve text classification and sentiment analysis problems posed as a supervised learning task. In NLP, the interest is on developing multilingual text representations, also focused on those representations that facilitate transfer knowledge between languages.

In 2010, he obtained his Ph.D. from the School of Computer Science and Electronic Engineering at the University of Essex, U.K., working under the supervision of Professor Riccardo Poli. The topic was in the field of GP Theory. The idea was the development of performance models that can be used in practice. Later on, these models were applied to time series forecasters and other supervised learning techniques.

Besides publishing the research on top-ranked journals and conference, he has decided to accompany the manuscript with the source code that implements the idea presented. The objective is to facilitate the replication process needed in any research work. This process has, as a result, four software developments publicly available at github.com, namely microTCB4MSA, EvoMSA, and EvoDAG. All of them are implemented in Python with continuous integration techniques such as travis-ci.org and appveyor.com, to test them on Linux, OSX, and Windows. Also, these developments can be installed with pip or using conda distribution.

INGEOTEC research interest is text categorization seen as a supervised learning problem, that is, as a classification task. In this problem, we have developed two text modeling techniques that represent the text in a vector space model and use a Support Vector Machine as a classifier. These techniques are which is a sentiment analysis classifier and a general text classifier. In addition this, we have been working on novel classifiers based on Genetic Programming EvoDAG.

In sentiment analysis, author profiling and text-image matching problem, we have participated in a number of competitions such as:

  • IberEval'18 (Spanish HAHA) INGEOTEC obtained 1st place in humor analysis respectively (see Proceedings)
  • IberEval'18 (Spanish, MEX-A3T). INGEOTEC obtained 1st and 3rd place in Aggressiveness detection and Author profiling task, respectively (see Proceedings)
  • PAN'18 (Arabic, English and Spanish). INGEOTEC obtained the 3rd place (23 participants) in global ranking (see Proceedings)
  • RedICA Text-Image Matching (RICATIM) Challenge. I3GO+ obtained the 1st place in the development and final phase (see Results).
  • TASS'17 (Spanish). INGEOTEC obtained the 1st place (11 teams) in Task 1 (General Corpus of TASS) (see Proceedings).
  • PAN'17 (Arabic, English, Portuguese and Spanish). INGEOTEC (Tellez et al.) obtained the 3rd place (22 participants) in global ranking (see Results)
  • SemEval'17 (English and Arabic). INGEOTEC obtained the 6th place (69 participants) in English (see Results) and 4th (18 participants) in Arabic (see Results).
  • SENTIPOLC'16 (Italian). INGEOTEC obtained 5th place (15 participants) in subjective classification and 9th (15 participants) in polarity classification (see Proceeding).
  • TASS'16 (Spanish). INGEOTEC obtained the 3rd place in 3 and 5 polarity levels (see Proceedings).
  • TASS'15 (Spanish). This is our first competition where it was obtained 12th (17 participants) in 5 polarity levels and 10th (17 participants) in 3 polarity levels (see Proceedings)).

Current Students

  • M.C. José Ortiz Bejar. Scholar Google
  • M.C. Claudia Nallely Sánchez Gómez.
  • M.C. Sergio Martín Nava Muñoz.

Past Students

  • Dr. Ranyart Rodrigo Suarez Ponce de Leon. Scholar Google
  • Dr. Noel Rodriguez Maya. Scholar Google
  • M.C. Jose Maria Valencia Ramirez (with Honors). Scholar Google
  • M.C. Jose Rafael Cedeño Gonzalez (with Honors). Scholar Google
  • M.C. Marco Antonio Pacheco Alvarez.
  • M.C. Eric Iturbide Diaz.
  • M.C. Marco Tulio Arreola Fernandez.

Software

In order to facilitate and encourage the reproducibility of our research, we have decided to make the software available with an open source license. We have decided to implement our developments in Python following some continuous integration techniques (using travis-ci.org), unit testing (using nose), and coverage (using Coveralls).

Evolving Directed Acyclic Graph (EvoDAG)

Evolving Directed Acyclic Graph (EvoDAG) is a steady-state Genetic Programming system with tournament selection. The main characteristic of EvoDAG is that the genetic operation is performed at the root. EvoDAG was inspired by the geometric semantic crossover proposed by Alberto Moraglio et al. and the implementation performed by Leonardo Vanneschi et al.

EvoDAG is described in the following conference paper EvoDAG: A semantic Genetic Programming Python library Mario Graff, Eric S. Tellez, Sabino Miranda-Jiménez, Hugo Jair Escalante. 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) pp 1-6. A pre-print version can be download from here.

A Baseline for Multilingual Sentiment Analysis (B4MSA)

B4MSA is a Python Sentiment Analysis Classifier for Twitter-like short texts. It can be used to create a first approximation to a sentiment classifier on any given language. It is almost language-independent, but it can take advantage of the particularities of a language.

It is written in Python making use of NTLK, scikit-learn and gensim to create simple but effective sentiment classifiers.

microTC

microTC follows a minimalistic approach to text classification. It is designed to tackle text-classification problems in an agnostic way, being both domain and language independent.
Currently, we only produce single-label classifiers; but support for multi-labeled problems is in the roadmap.

microTC is intentionally simple, so only a small number of features where implemented. However, it uses a some complex tools from gensimnumpy and scikit-learn.

Lectures

INFOTEC

112 Circuito Tecnopolo Norte Col. Tecnopolo Pocitos II, C.P. 20313, Aguascalientes, Ags, México.

Tel. +52 (555) 624 28 00 Ext. 6315
email: mario.graff at infotec.mx