Hongzhi Xu

Postdoctoral Researcher
Department of Computer and Information Science, University of Pennsylvania.

Email: hongz.xu@gmail.com
Tel: (267) 506 9856
Address: 223 Towne Building, 220 South 33rd Street, Philadelphia, PA 19104, USA.

 

 

 

General Research Interests

Semantics, Chinese Linguistics, Computational Linguistics, Computational Morphology

 

 

 

 

Education

 

2011–2015        Ph.D. in Linguistics, The Hong Kong Polytechnic University

2005–2008        M.S. in Software Engineering, Tsinghua University

2000–2004        B.S. in Computer Science, Chengdu University of Technology

 

 

 

Research Experience

 

2015–Present    Postdoctoral Researcher, Department of Computer and Information Science, University of Pennsylvania.
Working with Mitch Marcus, Lyle Ungar, and Charles Yang on unsupervised morphological analysis across different languages under the LORELEI project funded by DARPA. Our model outperforms state-of-the-art systems, not only in morphology segmentation task itself, but also when used in MT and NER tasks of the same project.

 

2014–2015        Research Assistant, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University
Supervisor: Prof. Dingxu Shi and Prof. Chu-Ren Huang.
Responsibility: constructing an example database with a query interface for the readers of the new book “A Reference Grammar of Chinese”.

 

2008–2011        Assistant Researcher, NEC Laboratories China
I participated in two research projects in three years. The first project is relation relations between companies. The second project was sentiment analysis on Chinese customer reviews. My work is to explore linguistic structure to expand sentiment dictionaries based on large unlabeled corpora.

 

 

 

Teaching and Services

 

2014-2017        Program Committee Member of international Conferences: Chinese Lexical Semantics Workshop (CLSW) 2015, 2016 & 2017, IALP (International Association of Asian Language Processing) 2016 & 2017, Reviewer of the journal Lingua Sinica 2014 & 2016.

 

2012–2013        Teaching Assistant of Chinese Writing in the CBS department at PolyU. My responsibility was to give advice on the Chinese writing assignments of more than 50 Hong Kong local undergraduate students individually mainly on grammatical points.

 

2007.9–12        Teaching Assistant of the Data Mining Class. I worked as a teaching assistant for the Data Mining class in Tsinghua University. My responsibility was to assign homework, answer questions on exercises as well as give two classes for assessing their homework and term papers.

 

 

 

Honors and Awards

 

2015                 We took part in Task 11, SemEval 2015: Sentiment Analysis of Figurative Language in Twitter. Our team ranked 1st in sentiment intensity identification in ironic tweets and 3rd in all tweets among 15 teams (1/15, 3/15).

 

2008                 Distinguished Master Thesis Award granted by Tsinghua University (5/120).

 

2008                 Outstanding Graduate Student Award granted by Tsinghua University (2/120).

 

2007                 SSRT Research Funding granted by School of Software, Tsinghua University.

 

 

 

Linguistic Resources and Tools

 

2015                  Hongzhi Xu, Chu-Ren Huang, and Dingxu Shi. CRG Example Database. Each example is annotated with one or more from 1324 grammar points discussed in the book “A Reference Grammar of Chinese”. Each grammar point is roughly associated with 200 examples. The database contains 274,329 examples in total.

 

2015                  Karl Neergaard, Hongzhi Xu, Chu-Ren Huang. Chinese Phonological Neighborhood Database, hosted by Linguistic Data Consortium (LDC). The database contains phonological neighbors and statistical information of Chinese words based on 14 different segmentation schemes with and without tones.

 

2014                  Hongzhi Xu and Anna Laszlo. English Soap Opera Subtitles Database for Irony and Sarcasm Studies, created based on subtitles of 23 Soap Operas, containing 135M words.

 

2013                  Chu-Ren Huang, Sophia Yat-Mei Lee, Ying Chen, Shoushan Li and Hongzhi Xu. Chinese Event-based Emotion Corpus, hosted by Linguistic Data Consortium (LDC), containing 8,973 examples.

 

2012                  Hongzhi Xu. Offline Corpus Search Tool. The tool I developed is an offline corpus search engine that provides similar functions as online systems, e.g. UK sketch engine. It supports a corpus query language for complex search. It allows users to search any offline corpora and save the results with required statistical information.

 

 

 

 

Publications

 

 

Conference Papers

 

Karl Neergaard, Hongzhi Xu, and Chu-Ren Huang. Database of Mandarin Neighborhood Statistics. 2016. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). pp. 23–28. Portorož, Slovenia.

 

Qingqing Zhao, Chu-Ren Huang, and Hongzhi Xu. 2015. Auditory Synaesthesia and Near Synonyms: A Corpus-Based Analysis of sheng1 and yin1 in Mandarin Chinese. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 2015). pp. 315–322. Shanghai, China.

 

Piyoros Tungthamthiti, Enrico Santus, Hongzhi Xu, Chu-Ren Huang and Kiyoaki Shirai.  2015. Sentiment Analyzer with Rich Features for Ironic and Sarcastic Tweets. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 2015). pp. 178–187. Shanghai, China.

 

Hongzhi Xu, Enrico Santus, Anna Laszlo and Chu-Ren Huang. 2015. LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets. SemEval 2015 Task 11: Sentiment Analysis of Figurative Language in Twitter, collocated with North American Chapter of Association of Computational Linguistics (NAACL 2015). Denver, Colorado, U.S.A.

 

Hongzhi Xu, Dingxu Shi and Chu-Ren Huang. 2015. A New Categorization Framework for Chinese Adverbs. The 16th Chinese Lexical Semantic Workshop (CLSW 2015), LNAI. Beijing, China.

 

Hongzhi Xu and Chu-Ren Huang. 2014. Annotate and Identify Modalities, Speech Acts and Finer-Grained Event Types in Chinese Text. COLING Workshop on Lexical and Grammatical Resources for Language Processing. Dublin, Ireland.

 

Jingxia Lin, Hongzhi Xu, Menghan Jiang and Chu-Ren Huang. 2014. Annotation and Classification of Light Verbs and Light Verb Variations in Mandarin Chinese. COLING Workshop on Lexical and Grammatical Resources for Language Processing. Dublin, Ireland.

 

Chu-Ren Huang, Jingxia Lin, Menghan Jiang and Hongzhi Xu. 2014. Corpus-based Study and Identification of Mandarin Chinese Light Verb Variations. COLING Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects. Dublin, Ireland.

 

Hongzhi Xu and Chu-Ren Huang. 2013. A Rule System for Chinese Time Entity Recognition by Comprehensive Linguistic Study. The 6th International Joint Conference on Natural Language Processing (IJCNLP 2013). Nagoya, Japan.

 

Hongzhi Xu and Chu-Ren Huang. 2013. Primitives of Events and the Semantic Representation. The 6th International Conference on Generative Approaches to the Lexicon (GL 2013). Pisa, Italy.

 

Shan Wang, Chu-Ren Huang and Hongzhi Xu. 2012. Compositionality of NN Compounds: A Case Study on [N1+Artifactual-Type Event Nouns]. In 26th Pacific Asia Conference on Language, Information and Computation (PACLIC 2012). pages 70–79. Bali, Indonesia.

 

Jingxia Lin, Chu-Ren Huang, Huarui Zhang and Hongzhi Xu. 2012. The Headedness of Mandarin Chinese Serial Verb Constructions: A Corpus-Based Study. In 26th Pacific Asia Conference on Language, Information and Computation (PACLIC 2012). Bali, Indonesia. (Best Paper Award)

 

Hongzhi Xu, Helen Kaiyun Chen, Chu-Ren Huang, Qin Lu, Tin-Shing Chiu, Dingxu Shi. 2012. A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies. In Proceedings of International Conference on Language Resources and Evaluation (LREC 2012). Istanbul, Turkey.

 

Hongzhi Xu, Kai Zhao, Likun Qiu and Changjian Hu. 2010. Expanding Chinese Sentiment Dictionaries from Large Scale Unlabeled Corpus. In Proceedings of the 24rd Pacific Asia Conference on Language, Information and Computation (PACLIC 2010). Sendai, Japan.

 

Hongzhi Xu, Changjian Hu and Guoyang Shen. 2009. Discovery of Dependency Tree Patterns for Relation Extraction. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC 2009). Hong Kong, China.

 

Hongzhi Xu and Chunping Li. 2008. Combining context features by Canonical Belief Network for Chinese Part-of-Speech Tagging. The Third International Joint Conference on Natural Language Processing (IJCNLP 2008). Hyderabad, India.

 

Hongzhi Xu and Chunping Li. 2007. A Novel Term Weighting Scheme for Automated Text Categorization. In Proceedings of the 7th International Conference of Intelligent Systems Design and Applications (ISDA 2007). Rio de Janeiro, Brazil.

 

 

Conference Presentations and Talks:

 

Hongzhi Xu. 2017. Unsupervised Morphology Learning with Statistical Paradigms. CLUNCH at Penn.

 

Francesca Strik Lievers, Hongzhi Xu and Ge Xu. 2013. A Methodology for the Extraction of Lexicalized Synaesthesia from Corpora. Presented in the 19th conference of International Congress of Linguistics (ICL 2013). Geneva.

 

Hongzhi Xu and Chu-Ren Huang. 2013. The Generative Lexicon for Chinese Lexical Semantics: A Case Study on chī (eat). Annual Conference of the International Association of Chinese Linguistics (IACL 2013). Taipei, Taiwan.

 

Hongzhi Xu and Shan Wang. 2012. Chinese Relative Clause: Descriptive or Restrictive. Annual Conference of the International Association of Chinese Linguistics (IACL 2012). Hong Kong, China.