Corpus Hopinion

Hopinion is a corpus of opinions in Spanish. Hopinion contains 17,934 opinions (2,388,848 words), mainly about hotels, from the TripAdvisor website.

The opinions are accompanied by linguistic information and metadata. Regarding linguistic information, 4,740 texts are annotated with the lemma and morphological category of words. Metadata refers to users and items. For users, information such as alias, gender, age, place of origin, style, and the purpose of the trip has been retrieved. For items, details include the type of accommodation, its category (number of stars), the score given by the user and other travelers, its location, etc.

Additionally, Hopinion incorporates the results (annotations, frequencies, etc.) of various experiments conducted on the base data. In the latter case, it is recommended to read the publication associated with each experiment. The file LEEME.txt contains more details about this resource.

Please, cite this resource as follows:

Roberto, John A., M. Antònia Martí, Maria Salomó (2012). ‘Análisis de la riqueza léxica en el contexto de la clasificación de atributos demográficos latentes’. Procesamiento del Lenguaje Natural, Vol. 48: 97-104.

Download Hopinion here.