2/2 Papers accepted at EMNLP 2018


This conference year has been really great. I am very happy to have two papers (out of two) accepted at EMNLP 2018, arguably the last major NLP conference of the year. When we have more details (code, etc.), I’ll update this website with a bit more info about each. In the meantime, it seems it’s time to get ready for Brussels 🙂


Coling 2018

We are just a few days away from Coling 2018!


  • I will present the paper SeVeN: Augmenting Word Embeddings
    with Unsupervised Relation Vectors. The main idea is to learn relation vectors from which we abstract away the properties of the words involved in the relation. The code and a working example are available here. Take this example (from the paper), for instance. Our 10 dimensional vector encoding the relation between gmail and email is similar to ie and browser, firefox and browser and even helvetica and font. Interestingly, and when compared with the traditional vector difference (the diffvec column), we see that these similarities are more relational and less attributional, i.e., they rely less on the semantics of the words involved in the relation.

Screenshot from 2018-08-12 18-08-19

  • I will also be hosting the third edition of the SemDeep workhsop. We will have a great mix of papers and invited talks, both of which will focus on knowledge representation, but from very different prisms.

And of course, looking forward to all the scientific program, meeting old friends, and the cooking lesson included in the social program 🙂

NAACL 2018

Resultado de imagen de new orleans

We are a bit more than a month away from NAACL 2018 and I thought I’d write a little bit about work I am directly or indirectly involved in, what I’m most excited about this year, and hopefully to whet your appetite for what is to come 🙂

Before the main conference, Jose Camacho-Collados, myself and Mohammad Taher Pilehvar will give a tutorial on the Interplay between Lexical Resources and NLP. We will give an overview of relevant avenues where corpus-based approaches to NLP interact with lexical resources, and vice versa, meaning how we can leverage NLP techniques to automatically improve the quality of existing resources (or create them from scratch). We will cover different topics on computational lexicography, but also knowledge-based embeddings (and their applications), or informing neural networks with expert knowledge. We will make the outline of the tutorial available very soon!

In the main conference I will present a short paper on identifying definition sentences for corpora. With state of the art results, our model may be used for easing up a glossary/dictionary writing process, or as the first step in an ontology learning pipeline. The code is available here.

Finally, I am particularly excited about two SemEval tasks I have had the honor to co-organize. Task 2 on Multilingual Emoji Prediction, and Task 9 on Hypernym Discovery. I will write a bit more in detail about each of them in a few days (why we did it, motivations, challenges, etc.), but let me just leave a couple of ideas out there.

  • The best system on hypernym discovery has as one of its components a Hearst’s patterns matching module.
  • The best system on multilingual emoji prediction uses an ngram-based SVM classifier. Not a neural network. These results are very interesting and we are looking forward to discussing this during the Workshop!

It seems then that “old-fashioned” methods (linear models, pattern matching, etc.) are still quintessential for good performance in a number of NLP tasks. Personally I think this is good, it highlights the fact that language cannot be modeled simply by throwing in a lot of data without considering the nuances of the linguistic problem we aim to model.

2017 Highlights

  • October 25th – We wrote a blog post and explained our method for comparing the use of emojis across Barcelona and Madrid, and won the 2nd prize of scientific popularization organized by the Catalan Association of Artificial Intelligence! Here’s the paper and here, the awarded content.
  •  September 7th – Our paper titled Towards the Understanding of Gaming Audiences by Modeling Twitch Emotes, which was presented at WNUT-17, won the best paper award. Congrats to Francesco Barbieri, Miguel Ballesteros, Joan Soler and Horacio Saggion!
  • September 4th – Our Master’s Student Albert Fullana received a grade of 8/10 for his work on predicting sentiment and usefulness in Amazon Reviews, congrats!
  • July 17th – I defended my PhD thesis, which was awarded the Summa Cum Laudegrade! It is available at the UPF repository.
  • June 13th – Our paper titled ELMDist: A Vector space Model with Words and MusicBrainz Entities received the best paper award at the SemDeep workshop at ESWC!
  • January 30th – Gave an extended version of our NLP for MIR tutorial at UPF. Videolectures here and slides here.

2016 Highlights

  • October 21st – Received the best poster award at the Catalan Conference of Artificial Intelligence for our paper titled  Finding and Expanding Hypernymic Relations in the Music Domain, with Sergio Oramas, JosĂ© Camacho-Collados and Horacio Saggion.
  • August 7th – Gave a tutorial at ISMIR 2016 on Natural Language Processing with Sergio Oramas and Shuo Zhang, advised by Xavier Serra and Horacio Saggion. The slides are available here.