The code and cleaned data for my Arabic opinion target paper can be found here.
You can cite this paper:
Noura Farra and Kathleen McKeown. 2017. SMARTies: Sentiment Models for Arabic Target Entities. In
EACL 2017.
Here you can download the first revision of the Arabic Opinion Target corpus we collected by annotating Aljazeera newspaper online comments. The corpus is crowdsourced and has topics from mainly three domains: Politics, Culture, and Sports. The overall entity-level sentiment agreement on the corpus is about 91%. We label targets with disagreement as 'undetermined'.
Please note that the original Aljazeera comments have not been processed for spelling or grammar errors. We keep the raw text in its original form.
If you use this data, please cite this paper:
Noura Farra, Kathleen McKeown, and Nizar Habash. 2015. Annotating Targets of Opinions in Arabic Using Crowdsourcing. In
Proceedings of the ACL-2015 Workshop on Arabic Natural Language processing (ANLP 2015).
For more information about the Aljazeera QALB corpus from which we selected the comments, please refer to this paper:
Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Oussama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sara Alkuhlani, and Kamal Oflazer. 2014.
Large Scale Arabic Error Annotation: Guidelines and Framework. In Proceedings of LREC.
For any questions or issues with the corpus, please contact noura@cs.columbia.edu.