2020 |
Papalampidi, Pinelopi; Keller, Frank; Frermann, Lea; Lapata, Mirella Screenplay Summarization Using Latent Narrative Structure Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1920–1933, Association for Computational Linguistics, 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{9e248a2f688d4a4d9d81ae1693b43d2a, title = {Screenplay Summarization Using Latent Narrative Structure}, author = {Pinelopi Papalampidi and Frank Keller and Lea Frermann and Mirella Lapata}, url = {https://acl2020.org/}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {1920--1933}, publisher = {Association for Computational Linguistics}, abstract = {Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront. As a result, such models are biased on position and often perform a smart selection of sentences from the beginning of the document. When summarizing long narratives, which have complex structure and present information piecemeal, simple position heuristics are not sufficient. In this paper, we propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models. We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays (i.e., extract an optimal sequence of scenes). Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode and improve summarization performance over general extractive algorithms leading to more complete and diverse summaries.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront. As a result, such models are biased on position and often perform a smart selection of sentences from the beginning of the document. When summarizing long narratives, which have complex structure and present information piecemeal, simple position heuristics are not sufficient. In this paper, we propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models. We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays (i.e., extract an optimal sequence of scenes). Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode and improve summarization performance over general extractive algorithms leading to more complete and diverse summaries. |
Chen, Patrick; Bogoychev, Nikolay; Heafield, Kenneth; Kirefu, Faheem Parallel Sentence Mining by Constrained Decoding Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1672–1678, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{dac6d5f2fd0f40648a6cb4bb07647b49, title = {Parallel Sentence Mining by Constrained Decoding}, author = {Patrick Chen and Nikolay Bogoychev and Kenneth Heafield and Faheem Kirefu}, url = {https://acl2020.org/}, doi = {10.18653/v1/2020.acl-main.152}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {1672–1678}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus. We argue that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. When benchmarked on the BUCC shared task, our method achieves results comparable to other submissions.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus. We argue that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. When benchmarked on the BUCC shared task, our method achieves results comparable to other submissions. |
Zhang, Biao; Williams, Philip; Titov, Ivan; Sennrich, Rico Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1628–1639, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{005eeb38ef854bf4a7957d757afbfe42, title = {Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation}, author = {Biao Zhang and Philip Williams and Ivan Titov and Rico Sennrich}, url = {https://acl2020.org/}, doi = {10.18653/v1/2020.acl-main.148}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {1628–1639}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both oneto-many and many-to-many settings, and improves zero-shot performance by ∼10 BLEU, approaching conventional pivot-based methods.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both oneto-many and many-to-many settings, and improves zero-shot performance by ∼10 BLEU, approaching conventional pivot-based methods. |
~n, Marta Ba; Chen, Pinzhen; Haddow, Barry; Heafield, Kenneth; Hoang, Hieu; `a, Miquel Espl; Forcada, Mikel; Kamran, Amir; Kirefu, Faheem; Koehn, Philipp; Ortiz-Rojas, Sergio; Pla, Leopoldo; á, Gema Ramírez-S; í, Elsa Sarr; Strelec, Marek; Thompson, Brian; Waites, William; Wiggins, Dion; Zaragoza, Jaume ParaCrawl: Web-Scale Acquisition of Parallel Corpora Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4555–4567, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{aeb1138d856e477a9ea0f3ee5900cab1, title = {ParaCrawl: Web-Scale Acquisition of Parallel Corpora}, author = {Marta Ba{~n} ó and Pinzhen Chen and Barry Haddow and Kenneth Heafield and Hieu Hoang and Miquel Espl `a and Mikel Forcada and Amir Kamran and Faheem Kirefu and Philipp Koehn and Sergio Ortiz-Rojas and Leopoldo Pla and Gema Ramírez-S á and Elsa Sarr í and Marek Strelec and Brian Thompson and William Waites and Dion Wiggins and Jaume Zaragoza}, url = {https://acl2020.org/}, doi = {10.18653/v1/2020.acl-main.417}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {4555–4567}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filtering. We also describe the parallel corpora released and evaluate their quality and their usefulness to create machine translation systems.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filtering. We also describe the parallel corpora released and evaluate their quality and their usefulness to create machine translation systems. |
Wang, Chaojun; Sennrich, Rico On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3544–3552, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{3e99e68531d949d8965f301b7b2adba1, title = {On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation}, author = {Chaojun Wang and Rico Sennrich}, url = {https://acl2020.org/}, doi = {10.18653/v1/2020.acl-main.326}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {3544–3552}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {The standard training algorithm in neural machine translation (NMT) suffers from exposure bias, and alternative algorithms have been proposed to mitigate this. However, the practical impact of exposure bias is under debate. In this paper, we link exposure bias to another well-known problem in NMT, namely the tendency to generate hallucinations under domain shift. In experiments on three datasets with multiple test domains, we show that exposure bias is partially to blame for hallucinations, and that training with Minimum Risk Training, which avoids exposure bias, can mitigate this. Our analysis explains why exposure bias is more problematic under domain shift, and also links exposure bias to the beam search problem, i.e. performance deterioration with increasing beam size. Our results provide a new justification for methods that reduce exposure bias: even if they do not increase performance on in-domain test sets, they can increase model robustness to domain shift.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The standard training algorithm in neural machine translation (NMT) suffers from exposure bias, and alternative algorithms have been proposed to mitigate this. However, the practical impact of exposure bias is under debate. In this paper, we link exposure bias to another well-known problem in NMT, namely the tendency to generate hallucinations under domain shift. In experiments on three datasets with multiple test domains, we show that exposure bias is partially to blame for hallucinations, and that training with Minimum Risk Training, which avoids exposure bias, can mitigate this. Our analysis explains why exposure bias is more problematic under domain shift, and also links exposure bias to the beam search problem, i.e. performance deterioration with increasing beam size. Our results provide a new justification for methods that reduce exposure bias: even if they do not increase performance on in-domain test sets, they can increase model robustness to domain shift. |
Bogoychev, Nikolay; Grundkiewicz, Roman; Alham Fikri, ; Behnke, Maximiliana; Heafield, Kenneth; Kashyap, Sidharth; Farsarakis, Emmanouil-Ioannis; Chudyk, Mateusz Edinburghtextquoterights Submissions to the 2020 Machine Translation Efficiency Task Inproceedings Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 218–224, Association for Computational Linguistics (ACL), 2020, (The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020). @inproceedings{8666728ba696489287a4b4c6caec9115, title = {Edinburghtextquoterights Submissions to the 2020 Machine Translation Efficiency Task}, author = {Nikolay Bogoychev and Roman Grundkiewicz and {Alham Fikri} Aji and Maximiliana Behnke and Kenneth Heafield and Sidharth Kashyap and Emmanouil-Ioannis Farsarakis and Mateusz Chudyk}, url = {https://sites.google.com/view/wngt20}, doi = {10.18653/v1/2020.ngt-1.26}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the Fourth Workshop on Neural Generation and Translation}, pages = {218–224}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multicore CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multicore setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.}, note = {The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multicore CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multicore setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality. |
Wilmot, David; Keller, Frank Modelling Suspense in Short Stories as Uncertainty Reduction over Neural Representation Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1763–1788, Association for Computational Linguistics, 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{b8d045d334a249f5b5be7a9a20c84f29, title = {Modelling Suspense in Short Stories as Uncertainty Reduction over Neural Representation}, author = {David Wilmot and Frank Keller}, url = {https://acl2020.org/}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {1763--1788}, publisher = {Association for Computational Linguistics}, abstract = {Suspense is a crucial ingredient of narrative fiction, engaging readers and making stories compelling. While there is a vast theoretical literature on suspense, it is computationally not well understood. We compare two ways for modelling suspense: surprise, a backward-looking measure of how unexpected the current state is given the story so far; and uncertainty reduction, a forward-looking measure of how unexpected the continuation of the story is. Both can be computed either directly over story representations or over their probability distributions. We propose a hierarchical language model that encodes stories and computes surprise and uncertainty reduction. Evaluating against short stories annotated with human suspense judgements, we find that uncertainty reduction over representations is the best predictor, resulting in near human accuracy. We also show that uncertainty reduction can be used to predict suspenseful events in movie synopses.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Suspense is a crucial ingredient of narrative fiction, engaging readers and making stories compelling. While there is a vast theoretical literature on suspense, it is computationally not well understood. We compare two ways for modelling suspense: surprise, a backward-looking measure of how unexpected the current state is given the story so far; and uncertainty reduction, a forward-looking measure of how unexpected the continuation of the story is. Both can be computed either directly over story representations or over their probability distributions. We propose a hierarchical language model that encodes stories and computes surprise and uncertainty reduction. Evaluating against short stories annotated with human suspense judgements, we find that uncertainty reduction over representations is the best predictor, resulting in near human accuracy. We also show that uncertainty reduction can be used to predict suspenseful events in movie synopses. |
Alham Fikri, ; Heafield, Kenneth Compressing Neural Machine Translation Models with 4-bit Precision Inproceedings Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 35–42, Association for Computational Linguistics (ACL), 2020, (The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020). @inproceedings{ec6f46c9625c47718f032bcdc6940cf9, title = {Compressing Neural Machine Translation Models with 4-bit Precision}, author = {{Alham Fikri} Aji and Kenneth Heafield}, url = {https://sites.google.com/view/wngt20}, doi = {10.18653/v1/2020.ngt-1.4}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the Fourth Workshop on Neural Generation and Translation}, pages = {35–42}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {Quantization is one way to compress Neural Machine Translation (NMT) models, especially for edge devices. This paper pushes quantization from 8 bits, seen in current work on machine translation, to 4 bits. Instead of fixed-point quantization, we use logarithmic quantization since parameters are skewed towards zero. We then observe that quantizing the bias terms in this way damages quality, so we leave them uncompressed. Bias terms are a tiny fraction of the model so the impact on compression rate is minimal. Retraining is necessary to preserve quality, for which we propose to use an error-feedback mechanism that treats compression errors like noisy gradients. We empirically show that NMT models based on the Transformer or RNN architectures can be compressed up to 4-bit precision without any noticeable quality degradation. Models can be compressed up to binary precision, albeit with lower quality. The RNN architecture appears more robust towards compression, compared to the Transformer.}, note = {The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Quantization is one way to compress Neural Machine Translation (NMT) models, especially for edge devices. This paper pushes quantization from 8 bits, seen in current work on machine translation, to 4 bits. Instead of fixed-point quantization, we use logarithmic quantization since parameters are skewed towards zero. We then observe that quantizing the bias terms in this way damages quality, so we leave them uncompressed. Bias terms are a tiny fraction of the model so the impact on compression rate is minimal. Retraining is necessary to preserve quality, for which we propose to use an error-feedback mechanism that treats compression errors like noisy gradients. We empirically show that NMT models based on the Transformer or RNN architectures can be compressed up to 4-bit precision without any noticeable quality degradation. Models can be compressed up to binary precision, albeit with lower quality. The RNN architecture appears more robust towards compression, compared to the Transformer. |
Alham Fikri, ; Bogoychev, Nikolay; Heafield, Kenneth; Sennrich, Rico In Neural Machine Translation, What Does Transfer Learning Transfer? Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7701–7710, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{d314ca89320f4c11b364309dffb360cb, title = {In Neural Machine Translation, What Does Transfer Learning Transfer?}, author = {{Alham Fikri} Aji and Nikolay Bogoychev and Kenneth Heafield and Rico Sennrich}, url = {https://acl2020.org/}, doi = {10.18653/v1/2020.acl-main.688}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {7701–7710}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers. We perform several ablation studies that limit information transfer, then measure the quality impact across three language pairs to gain a black-box understanding of transfer learning. Word embeddings play an important role in transfer learning, particularly if they are properly aligned. Although transfer learning can be performed without embeddings, results are sub-optimal. In contrast, transferring only the embeddings but nothing else yields catastrophic results. We then investigate diagonal alignments with auto-encoders over real languages and randomly generated sequences, finding even randomly generated sequences as parents yield noticeable but smaller gains. Finally, transfer learning can eliminate the need for a warm-up phase when training transformer models in high resource language pairs.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers. We perform several ablation studies that limit information transfer, then measure the quality impact across three language pairs to gain a black-box understanding of transfer learning. Word embeddings play an important role in transfer learning, particularly if they are properly aligned. Although transfer learning can be performed without embeddings, results are sub-optimal. In contrast, transferring only the embeddings but nothing else yields catastrophic results. We then investigate diagonal alignments with auto-encoders over real languages and randomly generated sequences, finding even randomly generated sequences as parents yield noticeable but smaller gains. Finally, transfer learning can eliminate the need for a warm-up phase when training transformer models in high resource language pairs. |
Heafield, Kenneth; Hayashi, Hiroaki; Oda, Yusuke; Finch, Andrew; Neubig, Graham; Li, Xian; Birch, Alexandra Findings of the Fourth Workshop on Neural Generation and Translation Inproceedings Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 1–9, Association for Computational Linguistics (ACL), 2020, (The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020). @inproceedings{fb89f11e929b4611a0c14f06fb3b3a0f, title = {Findings of the Fourth Workshop on Neural Generation and Translation}, author = {Kenneth Heafield and Hiroaki Hayashi and Yusuke Oda and Andrew Finch and Graham Neubig and Xian Li and Alexandra Birch}, url = {https://sites.google.com/view/wngt20}, doi = {10.18653/v1/2020.ngt-1.1}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the Fourth Workshop on Neural Generation and Translation}, pages = {1–9}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {We describe the finding of the Fourth Workshop on Neural Generation and Translation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2020). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the three shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document-level generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language and 3) STAPLE task: creation of as many possible translations of a given input text. This last shared task was organised by Duolingo.}, note = {The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We describe the finding of the Fourth Workshop on Neural Generation and Translation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2020). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the three shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document-level generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language and 3) STAPLE task: creation of as many possible translations of a given input text. This last shared task was organised by Duolingo. |
v Stanojevi s, Milo; Steedman, Mark Max-Margin Incremental CCG Parsing Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4111–4122, Association for Computational Linguistics, 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{749e58a878e544a594097c0c0930b2ff, title = {Max-Margin Incremental CCG Parsing}, author = {Milo{v Stanojevi s} 'c and Mark Steedman}, url = {https://acl2020.org/}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {4111--4122}, publisher = {Association for Computational Linguistics}, abstract = {Incremental syntactic parsing has been an active research area both for cognitive scientists trying to model human sentence processing and for NLP researchers attempting to combine incremental parsing with language modelling for ASR and MT. Most effort has been directed at designing the right transition mechanism, but less has been done to answer the question of what a probabilistic model for those transition parsers should look like. A very incremental transition mechanism of a recently proposed CCG parser when trained in straightforward locally normalised discriminative fashion produces very bad results on English CCGbank. We identify three biases as the causes of this problem: label bias, exposure bias and imbalanced probabilities bias. While known techniques for tackling these biases improve results, they still do not make the parser state of the art. Instead, we tackle all of these three biases at the same time using an improved version of beam search optimisation that minimises all beam search violations instead of minimising only the biggest violation. The new incremental parser gives better results than all previously published incremental CCG parsers, and outperforms even some widely used non-incremental CCG parsers.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Incremental syntactic parsing has been an active research area both for cognitive scientists trying to model human sentence processing and for NLP researchers attempting to combine incremental parsing with language modelling for ASR and MT. Most effort has been directed at designing the right transition mechanism, but less has been done to answer the question of what a probabilistic model for those transition parsers should look like. A very incremental transition mechanism of a recently proposed CCG parser when trained in straightforward locally normalised discriminative fashion produces very bad results on English CCGbank. We identify three biases as the causes of this problem: label bias, exposure bias and imbalanced probabilities bias. While known techniques for tackling these biases improve results, they still do not make the parser state of the art. Instead, we tackle all of these three biases at the same time using an improved version of beam search optimisation that minimises all beam search violations instead of minimising only the biggest violation. The new incremental parser gives better results than all previously published incremental CCG parsers, and outperforms even some widely used non-incremental CCG parsers. |
Silviu Vlad, ; Magdy, Walid iSarcasm: A Dataset of Intended Sarcasm Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1279–1289, Association for Computational Linguistics (ACL), 2020, ISBN: 978-1-952148-25-5, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{3a4ea18d07104ddd807aca4c369f9fe1, title = {iSarcasm: A Dataset of Intended Sarcasm}, author = {{Silviu Vlad} Oprea and Walid Magdy}, url = {https://acl2020.org/}, doi = {10.18653/v1/2020.acl-main.118}, isbn = {978-1-952148-25-5}, year = {2020}, date = {2020-07-10}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, pages = {1279–1289}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {We consider the distinction between intended and perceived sarcasm in the context of textual sarcasm detection. The former occurs when an utterance is sarcastic from the perspective of its author, while the latter occurs when the utterance is interpreted as sarcastic by the audience. We show the limitationsof previous labelling methods in capturing intended sarcasm and introduce the iSarcasm dataset of tweets labeled for sarcasm directly by their authors. Examining the state-of-the-art sarcasm detection models on our dataset showed low performance compared to previously studied datasets, which indicates that these datasets might be biased or obvious and sarcasm could be a phenomenon under-studied computationally thus far. By providing the iSarcasm dataset, we aim to encourage future NLP research to develop methods for detecting sarcasm in text as intended by the authors of the text, not as labeled under assumptions that we demonstrate to be sub-optimal}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We consider the distinction between intended and perceived sarcasm in the context of textual sarcasm detection. The former occurs when an utterance is sarcastic from the perspective of its author, while the latter occurs when the utterance is interpreted as sarcastic by the audience. We show the limitationsof previous labelling methods in capturing intended sarcasm and introduce the iSarcasm dataset of tweets labeled for sarcasm directly by their authors. Examining the state-of-the-art sarcasm detection models on our dataset showed low performance compared to previously studied datasets, which indicates that these datasets might be biased or obvious and sarcasm could be a phenomenon under-studied computationally thus far. By providing the iSarcasm dataset, we aim to encourage future NLP research to develop methods for detecting sarcasm in text as intended by the authors of the text, not as labeled under assumptions that we demonstrate to be sub-optimal |
v Stanojevi s, Milo; Steedman, Mark Span-Based LCFRS-2 Parsing Inproceedings Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies, pp. 111–121, Association for Computational Linguistics, 2020, ISBN: 978-1-952148-11-8, (The 16th Workshop Conference on Parsing Technologies, IWPT 2020 ; Conference date: 09-07-2020 Through 09-07-2020). @inproceedings{f05a309f74364a968da54d67a9954c19, title = {Span-Based LCFRS-2 Parsing}, author = {Milo{v Stanojevi s} 'c and Mark Steedman}, url = {https://iwpt20.sigparse.org/}, isbn = {978-1-952148-11-8}, year = {2020}, date = {2020-07-09}, booktitle = {Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies}, pages = {111--121}, publisher = {Association for Computational Linguistics}, abstract = {The earliest models for discontinuous constituency parsers used mildly context-sensitive grammars, but the fashion has changed in recent years to grammar-less transition-based parsers that use strong neural probabilistic models to greedily predict transitions. We argue that grammar-based approaches still have something to contribute on top of what is offered by transition-based parsers. Concretely, by using a grammar formalism to restrict the space of possible trees we can use dynamic programming parsing algorithms for exact search for the most probable tree. Previous chart-based parsers for discontinuous formalisms used probabilistically weak generative models. We instead use a span-based discriminative neural model that preserves the dynamic programming properties of the chart parsers. Our parser does not use an explicit grammar, but it does use explicit grammar formalism constraints: we generate only trees that are within the LCFRS-2 formalism. These properties allow us to construct a new parsing algorithm that runs in lower worst-case time complexity of O(l n4 +n6), where $n$ is the sentence length and $l$ is the number of unique non-terminal labels. This parser is efficient in practice, provides best results among chart-based parsers, and is competitive with the best transition based parsers. We also show that the main bottleneck for further improvement in performance is in the restriction of fan-out to degree 2. We show that well-nestedness is helpful in speeding up parsing, but lowers accuracy.}, note = {The 16th Workshop Conference on Parsing Technologies, IWPT 2020 ; Conference date: 09-07-2020 Through 09-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The earliest models for discontinuous constituency parsers used mildly context-sensitive grammars, but the fashion has changed in recent years to grammar-less transition-based parsers that use strong neural probabilistic models to greedily predict transitions. We argue that grammar-based approaches still have something to contribute on top of what is offered by transition-based parsers. Concretely, by using a grammar formalism to restrict the space of possible trees we can use dynamic programming parsing algorithms for exact search for the most probable tree. Previous chart-based parsers for discontinuous formalisms used probabilistically weak generative models. We instead use a span-based discriminative neural model that preserves the dynamic programming properties of the chart parsers. Our parser does not use an explicit grammar, but it does use explicit grammar formalism constraints: we generate only trees that are within the LCFRS-2 formalism. These properties allow us to construct a new parsing algorithm that runs in lower worst-case time complexity of O(l n4 +n6), where $n$ is the sentence length and $l$ is the number of unique non-terminal labels. This parser is efficient in practice, provides best results among chart-based parsers, and is competitive with the best transition based parsers. We also show that the main bottleneck for further improvement in performance is in the restriction of fan-out to degree 2. We show that well-nestedness is helpful in speeding up parsing, but lowers accuracy. |
Lucia Lushi, ; Magdy, Walid; Whalley, Heather; Wolters, Maria Examining the Role of Mood Patterns in Predicting Self-reported Depressive Symptoms Inproceedings WebSci '20: 12th ACM Conference on Web Science, pp. 164–173, Association for Computing Machinery (ACM), United States, 2020, (12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020). @inproceedings{aebe72323e46460c98639c3cdba12500, title = {Examining the Role of Mood Patterns in Predicting Self-reported Depressive Symptoms}, author = {{Lucia Lushi} Chen and Walid Magdy and Heather Whalley and Maria Wolters}, url = {https://websci20.webscience.org/}, doi = {10.1145/3394231.3397906}, year = {2020}, date = {2020-07-06}, booktitle = {WebSci '20: 12th ACM Conference on Web Science}, pages = {164–173}, publisher = {Association for Computing Machinery (ACM)}, address = {United States}, abstract = {Researchers have explored automatic screening models as a quick way to identify potential risks of developing depressive symptoms. Most existing models often use a persontextquoterights mood as reflected on social media at a single point in time as one of the predictive variables. In this paper, we study changes in mood over a period of one year using a mood profile, which explicitly models the changes of mood, and transitions between moods reflected on social media text. We used a subset of the "MyPersonality" Facebook data set that comprises users who have consented to and completed an assessment of depressive symptoms. The subset consists of 93,378 Facebook posts from 781 users. We observed less evidence of mood fluctuation expressed in social media text from those with low symptom measures compared to others with high symptom scores. Next, we leveraged a daily mood representation in Hidden Markov Models to determine its associations with subsequent self-reported symptoms. We found that individuals who have specific mood patterns are highly likely to have reported high depressive symptoms. However, not all of the high symptoms individuals necessarily displayed this characteristic, which indicates presence of potential subgroups driving these findings. Finally, we leveraged multiple mood representations to characterize levels of depression symptoms with a logistic regression model. Our findings support the claim that derived mood from social media text can be used as a proxy of real-life mood to infer depressive symptoms in the current sample. Combining the mood representations with other proxy signals can potentially advance the current automatic screening technology for research.}, note = {12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Researchers have explored automatic screening models as a quick way to identify potential risks of developing depressive symptoms. Most existing models often use a persontextquoterights mood as reflected on social media at a single point in time as one of the predictive variables. In this paper, we study changes in mood over a period of one year using a mood profile, which explicitly models the changes of mood, and transitions between moods reflected on social media text. We used a subset of the "MyPersonality" Facebook data set that comprises users who have consented to and completed an assessment of depressive symptoms. The subset consists of 93,378 Facebook posts from 781 users. We observed less evidence of mood fluctuation expressed in social media text from those with low symptom measures compared to others with high symptom scores. Next, we leveraged a daily mood representation in Hidden Markov Models to determine its associations with subsequent self-reported symptoms. We found that individuals who have specific mood patterns are highly likely to have reported high depressive symptoms. However, not all of the high symptoms individuals necessarily displayed this characteristic, which indicates presence of potential subgroups driving these findings. Finally, we leveraged multiple mood representations to characterize levels of depression symptoms with a logistic regression model. Our findings support the claim that derived mood from social media text can be used as a proxy of real-life mood to infer depressive symptoms in the current sample. Combining the mood representations with other proxy signals can potentially advance the current automatic screening technology for research. |
Wilson, Steve; Magdy, Walid; McGillivray, Barbara; Tyson, Gareth Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity Inproceedings WebSci '20: 12th ACM Conference on Web Science, pp. 155–163, Association for Computing Machinery (ACM), United States, 2020, (12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020). @inproceedings{7eb4c02c459e4accafa3fa598084445c, title = {Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity}, author = {Steve Wilson and Walid Magdy and Barbara McGillivray and Gareth Tyson}, url = {https://websci20.webscience.org/}, doi = {10.1145/3394231.3397905}, year = {2020}, date = {2020-07-06}, booktitle = {WebSci '20: 12th ACM Conference on Web Science}, pages = {155–163}, publisher = {Association for Computing Machinery (ACM)}, address = {United States}, abstract = {As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this research, we study the temporal activity trends on Urban Dictionary and provide the first analysis of how this activity relates to content being discussed on a major social network: Twitter. By collecting the whole of Urban Dictionary, as well as a large sample of tweets over seven years, we explore the connections between the words and phrases that are defined and searched for on Urban Dictionary and the content that is talked about on Twitter. Through a series of cross-correlation calculations, we identify cases in which Urban Dictionary activity closely reflects the larger conversation happening on Twitter. Then, we analyze the types of terms that have a stronger connection to discussions on Twitter, finding that Urban Dictionary activity that is positively correlated with Twitter is centered around terms related to memes, popular public figures, and offline events. Finally, we explore the relationship between periods of time when terms are trending on Twitter and the corresponding activity on Urban Dictionary, revealing that new definitions are more likely to be added to Urban Dictionary for terms that are currently trending on Twitter.}, note = {12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this research, we study the temporal activity trends on Urban Dictionary and provide the first analysis of how this activity relates to content being discussed on a major social network: Twitter. By collecting the whole of Urban Dictionary, as well as a large sample of tweets over seven years, we explore the connections between the words and phrases that are defined and searched for on Urban Dictionary and the content that is talked about on Twitter. Through a series of cross-correlation calculations, we identify cases in which Urban Dictionary activity closely reflects the larger conversation happening on Twitter. Then, we analyze the types of terms that have a stronger connection to discussions on Twitter, finding that Urban Dictionary activity that is positively correlated with Twitter is centered around terms related to memes, popular public figures, and offline events. Finally, we explore the relationship between periods of time when terms are trending on Twitter and the corresponding activity on Urban Dictionary, revealing that new definitions are more likely to be added to Urban Dictionary for terms that are currently trending on Twitter. |
Meaney, Julie-Anne; Steve R, ; Magdy, Walid Hahackathon: Incorporating Demographic Factors into Shared Humor Tasks Inproceedings Proceedings of the International Workshop on Semantic Evaluation 2020, 2020, (International Workshop on Semantic Evaluation 2020, SemEval 2020 ; Conference date: 12-12-2020 Through 13-12-2020). @inproceedings{7ebba9c814854e068bf36be4e1c04849, title = {Hahackathon: Incorporating Demographic Factors into Shared Humor Tasks}, author = {Julie-Anne Meaney and {Steve R } Wilson and Walid Magdy}, url = {http://alt.qcri.org/semeval2020/#}, year = {2020}, date = {2020-06-24}, booktitle = {Proceedings of the International Workshop on Semantic Evaluation 2020}, abstract = {Humor detection is part of figurative language processing - an area which poses unique challenges in NLP, due to its emphasis on multiple word senses, cultural knowledge, and pragmatics. However, recent shared tasks in humor classification have struggled with two issues: either the data comprises a highly constrained sub-type of humor (e.g. news headlines or hashtags submitted to a comedy show) which does not broadly represent the genre (Potash et al., 2017; Hossain et al., 2019), or the data is so indiscriminate that the inter-annotator agreement on its comic content is drastically low (Castro et al., 2018; Chiruzzo et al., 2019). This may result in the creation of humor detection systems which produce excellent evaluation results, but which may not scale to other humor datasets, improve ownstream tasks like content moderation, or contribute to our understanding of humor.}, note = {International Workshop on Semantic Evaluation 2020, SemEval 2020 ; Conference date: 12-12-2020 Through 13-12-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Humor detection is part of figurative language processing - an area which poses unique challenges in NLP, due to its emphasis on multiple word senses, cultural knowledge, and pragmatics. However, recent shared tasks in humor classification have struggled with two issues: either the data comprises a highly constrained sub-type of humor (e.g. news headlines or hashtags submitted to a comedy show) which does not broadly represent the genre (Potash et al., 2017; Hossain et al., 2019), or the data is so indiscriminate that the inter-annotator agreement on its comic content is drastically low (Castro et al., 2018; Chiruzzo et al., 2019). This may result in the creation of humor detection systems which produce excellent evaluation results, but which may not scale to other humor datasets, improve ownstream tasks like content moderation, or contribute to our understanding of humor. |
Vamvas, Jannis; Sennrich, Rico X-stance: A Multilingual Multi-Target Dataset for Stance Detection Inproceedings Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), CEUR-WS.org, 2020, (5th SwissText & 16th KONVENS Joint Conference 2020, SwissText and KONVENS 2020 ; Conference date: 23-06-2020 Through 25-06-2020). @inproceedings{854df4346d18432ab8328fdeb08dc906, title = {X-stance: A Multilingual Multi-Target Dataset for Stance Detection}, author = {Jannis Vamvas and Rico Sennrich}, url = {https://swisstext-and-konvens-2020.org/}, year = {2020}, date = {2020-06-23}, booktitle = {Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS)}, publisher = {CEUR-WS.org}, series = {CEUR Workshop Proceedings}, abstract = {We extract a large-scale stance detection dataset from comments written by candidates of elections in Switzerland. The dataset consists of German, French and Italian text, allowing for a cross-lingual evaluation of stance detection. It contains 67 000 comments on more than 150 political issues (targets). Unlike stance detection models that have specific target issues, we use the dataset to train a single model on all the issues. To make learning across targets possible, we prepend to each instance a natural question that represents the target (e.g. “Do you support X?”). Baseline results from multi-lingual BERT show that zero-shot cross-lingual and cross-target transfer of stance detection is moderately successful with this approach.}, note = {5th SwissText & 16th KONVENS Joint Conference 2020, SwissText and KONVENS 2020 ; Conference date: 23-06-2020 Through 25-06-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We extract a large-scale stance detection dataset from comments written by candidates of elections in Switzerland. The dataset consists of German, French and Italian text, allowing for a cross-lingual evaluation of stance detection. It contains 67 000 comments on more than 150 political issues (targets). Unlike stance detection models that have specific target issues, we use the dataset to train a single model on all the issues. To make learning across targets possible, we prepend to each instance a natural question that represents the target (e.g. “Do you support X?”). Baseline results from multi-lingual BERT show that zero-shot cross-lingual and cross-target transfer of stance detection is moderately successful with this approach. |
Kekulluoglu, Dilara; Magdy, Walid; Vaniea, Kami Analysing Privacy Leakage of Life Events on Twitter Inproceedings WebSci '20: 12th ACM Conference on Web Science, pp. 287–294, Association for Computing Machinery (ACM), United States, 2020, (12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020). @inproceedings{cebea752fa9e409cb982caea9de01bb4, title = {Analysing Privacy Leakage of Life Events on Twitter}, author = {Dilara Kekulluoglu and Walid Magdy and Kami Vaniea}, url = {https://websci20.webscience.org/}, doi = {10.1145/3394231.3397919}, year = {2020}, date = {2020-06-07}, booktitle = {WebSci '20: 12th ACM Conference on Web Science}, pages = {287–294}, publisher = {Association for Computing Machinery (ACM)}, address = {United States}, abstract = {People share a wide variety of information on Twitter, including the events in their lives, without understanding the size of their audience. While some of these events can be considered harmless such as getting a new pet, some of them can be sensitive such as gender-transition experiences. Every interaction increases the visibility of the tweets and even if the original tweet is protected or deleted, public replies to it will stay in the platform. These replies might signal the events in the original tweet which cannot be managed by the event subject. In this paper, we aim to understand the scope of life event disclosures for those with both public and protected (private) accounts. We collected 635k tweets with the phrase “happy for you” over four months. We found that roughly 10% of the tweets collected were celebrating a mentioned usertextquoterights life event, ranging from marriage to surgery recovery. 8% of these tweets were directed at protected accounts. The majority of mentioned users also interacted with these tweets by liking, retweeting, or replying.}, note = {12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } People share a wide variety of information on Twitter, including the events in their lives, without understanding the size of their audience. While some of these events can be considered harmless such as getting a new pet, some of them can be sensitive such as gender-transition experiences. Every interaction increases the visibility of the tweets and even if the original tweet is protected or deleted, public replies to it will stay in the platform. These replies might signal the events in the original tweet which cannot be managed by the event subject. In this paper, we aim to understand the scope of life event disclosures for those with both public and protected (private) accounts. We collected 635k tweets with the phrase “happy for you” over four months. We found that roughly 10% of the tweets collected were celebrating a mentioned usertextquoterights life event, ranging from marriage to surgery recovery. 8% of these tweets were directed at protected accounts. The majority of mentioned users also interacted with these tweets by liking, retweeting, or replying. |
Bahgat, Mohamed; Wilson, Steve; Magdy, Walid Towards Using Word Embedding Vector Space for Better Cohort Analysis Inproceedings Proceedings of the International AAAI Conference on Web and Social Media, pp. 919–923, AAAI Press, 2020, (14th International Conference on Web and Social Media, ICWSM 2020 ; Conference date: 08-06-2020 Through 11-06-2020). @inproceedings{958a818e5adf4f24b2438be939dee357, title = {Towards Using Word Embedding Vector Space for Better Cohort Analysis}, author = {Mohamed Bahgat and Steve Wilson and Walid Magdy}, url = {https://www.icwsm.org/2020/index.html}, year = {2020}, date = {2020-05-26}, booktitle = {Proceedings of the International AAAI Conference on Web and Social Media}, volume = {14}, pages = {919--923}, publisher = {AAAI Press}, edition = {1}, abstract = {On websites like Reddit, users join communities where they discuss specific topics which cluster them into possible cohorts. The authors within these cohorts have the opportunity to post more openly under the blanket of anonymity, and such openness provides a more accurate signal on the real issues individuals are facing. Some communities contain discussions about mental health struggles such as depression and suicidal ideation. To better understand and analyse these individuals, we propose to exploit properties of word embeddings that group related concepts close to each other in the embeddings space. For the posts from each topically situated sub-community, we build a word embeddings model and use handcrafted lexicons to identify emotions, values and psycholinguistically relevant concepts. We then extract insights into ways users perceive these concepts by measuring distances between them and references made by users either to themselves, others orother things around them. We show how our proposed approach can extract meaningful signals that go beyond the kinds of analyses performed at the individual word level.}, note = {14th International Conference on Web and Social Media, ICWSM 2020 ; Conference date: 08-06-2020 Through 11-06-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } On websites like Reddit, users join communities where they discuss specific topics which cluster them into possible cohorts. The authors within these cohorts have the opportunity to post more openly under the blanket of anonymity, and such openness provides a more accurate signal on the real issues individuals are facing. Some communities contain discussions about mental health struggles such as depression and suicidal ideation. To better understand and analyse these individuals, we propose to exploit properties of word embeddings that group related concepts close to each other in the embeddings space. For the posts from each topically situated sub-community, we build a word embeddings model and use handcrafted lexicons to identify emotions, values and psycholinguistically relevant concepts. We then extract insights into ways users perceive these concepts by measuring distances between them and references made by users either to themselves, others orother things around them. We show how our proposed approach can extract meaningful signals that go beyond the kinds of analyses performed at the individual word level. |
Rechkemmer, Amy; Wilson, Steve; Mihalcea, Rada Small Town or Metropolis? Analyzing the Relationship between Population Size and Language Inproceedings Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 6287–6291, European Language Resources Association (ELRA), 2020, (12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020). @inproceedings{d151fd6f0c604f92824701618093d810, title = {Small Town or Metropolis? Analyzing the Relationship between Population Size and Language}, author = {Amy Rechkemmer and Steve Wilson and Rada Mihalcea}, url = {https://lrec2020.lrec-conf.org/en/}, year = {2020}, date = {2020-05-16}, booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)}, pages = {6287–6291}, publisher = {European Language Resources Association (ELRA)}, abstract = {The variance in language used by different cultures has been a topic of study for researchers in linguistics and psychology, but often times, language is compared across multiple countries in order to show a difference in culture. As a geographically large country that is diverse in population in terms of the background and experiences of its citizens, the U.S. also contains cultural differences within its own borders. Using a set of over 2 million posts from distinct Twitter users around the country dating back as far as 2014, we ask the following question: is there a difference in how Americans express themselves online depending on whether they reside in an urban or rural area? We categorize Twitter users as either urban or rural and identify ideas and language that are more commonly expressed in tweets written by one population over the other. We take this further by analyzing how the language from specific cities of the U.S. compares to the language of other cities and by training predictive models to predict whether a user is from an urban or rural area. We publicly release the tweet and user IDs that can be used to reconstruct the dataset for future studies in this direction.}, note = {12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The variance in language used by different cultures has been a topic of study for researchers in linguistics and psychology, but often times, language is compared across multiple countries in order to show a difference in culture. As a geographically large country that is diverse in population in terms of the background and experiences of its citizens, the U.S. also contains cultural differences within its own borders. Using a set of over 2 million posts from distinct Twitter users around the country dating back as far as 2014, we ask the following question: is there a difference in how Americans express themselves online depending on whether they reside in an urban or rural area? We categorize Twitter users as either urban or rural and identify ideas and language that are more commonly expressed in tweets written by one population over the other. We take this further by analyzing how the language from specific cities of the U.S. compares to the language of other cities and by training predictive models to predict whether a user is from an urban or rural area. We publicly release the tweet and user IDs that can be used to reconstruct the dataset for future studies in this direction. |
Wilson, Steve; Magdy, Walid; McGillivray, Barbara; Garimella, Kiran; Tyson, Gareth Urban Dictionary Embeddings for Slang NLP Applications Inproceedings Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 4764–4773, European Language Resources Association (ELRA), 2020, (12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020). @inproceedings{fa9d7a7bccdc4b90acaec42cc881ac60, title = {Urban Dictionary Embeddings for Slang NLP Applications}, author = {Steve Wilson and Walid Magdy and Barbara McGillivray and Kiran Garimella and Gareth Tyson}, url = {https://lrec2020.lrec-conf.org/en/}, year = {2020}, date = {2020-05-16}, booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)}, pages = {4764–4773}, publisher = {European Language Resources Association (ELRA)}, abstract = {The choice of the corpus on which word embeddings are trained can have a sizable effect on the learned representations, the types of analyses that can be performed with them, and their utility as features for machine learning models. To contribute to the existing sets of pre-trained word embeddings, we introduce and release the first set of word embeddings trained on the content of Urban Dictionary, a crowd-sourced dictionary for slang words and phrases. We show that although these embeddings are trained on fewer total tokens (by at least an order of magnitude compared to most popular pre-trained embeddings), they have high performance across a range of common word embedding evaluations, ranging from semantic similarity to word clustering tasks. Further, for some extrinsic tasks such as sentiment analysis and sarcasm detection where we expect to require some knowledge of colloquial language on social media data, initializing classifiers with the Urban Dictionary Embeddings resulted in improved performance compared to initializing with a range of other well-known, pre-trained embeddings that are order of magnitude larger in size.}, note = {12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The choice of the corpus on which word embeddings are trained can have a sizable effect on the learned representations, the types of analyses that can be performed with them, and their utility as features for machine learning models. To contribute to the existing sets of pre-trained word embeddings, we introduce and release the first set of word embeddings trained on the content of Urban Dictionary, a crowd-sourced dictionary for slang words and phrases. We show that although these embeddings are trained on fewer total tokens (by at least an order of magnitude compared to most popular pre-trained embeddings), they have high performance across a range of common word embedding evaluations, ranging from semantic similarity to word clustering tasks. Further, for some extrinsic tasks such as sentiment analysis and sarcasm detection where we expect to require some knowledge of colloquial language on social media data, initializing classifiers with the Urban Dictionary Embeddings resulted in improved performance compared to initializing with a range of other well-known, pre-trained embeddings that are order of magnitude larger in size. |
Filgueira Vicente, Rosa ; Grover, Claire; Terras, Melissa; Alex, Beatrice Geoparsing the Historical Gazetteers of Scotland: Accurately Computing Location in Mass Digitised Texts Inproceedings Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pp. 24–30, European Language Resources Association (ELRA), 2020, (8th Workshop on the Challenges in the Management of Large Corpora, CMLC-8 ; Conference date: 16-05-2020 Through 16-05-2020). @inproceedings{d715474282dd482ca7bc5b2f792fd0f4, title = {Geoparsing the Historical Gazetteers of Scotland: Accurately Computing Location in Mass Digitised Texts}, author = {Rosa {Filgueira Vicente} and Claire Grover and Melissa Terras and Beatrice Alex}, url = {http://corpora.ids-mannheim.de/cmlc-2020.html}, year = {2020}, date = {2020-05-16}, booktitle = {Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora}, pages = {24–30}, publisher = {European Language Resources Association (ELRA)}, abstract = {This paper describes work in progress on devising automatic and parallel methods for geoparsing large digital historical textual data by combining the strengths of three natural language processing (NLP) tools, the Edinburgh Geoparser, spaCy and defoe, and employing different tokenisation and named entity recognition (NER) techniques. We apply these tools to a large collection of nineteenth century Scottish geographical dictionaries, and describe preliminary results obtained when processing this data.}, note = {8th Workshop on the Challenges in the Management of Large Corpora, CMLC-8 ; Conference date: 16-05-2020 Through 16-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper describes work in progress on devising automatic and parallel methods for geoparsing large digital historical textual data by combining the strengths of three natural language processing (NLP) tools, the Edinburgh Geoparser, spaCy and defoe, and employing different tokenisation and named entity recognition (NER) techniques. We apply these tools to a large collection of nineteenth century Scottish geographical dictionaries, and describe preliminary results obtained when processing this data. |
Coleman, Susie; Secker, Andrew; Bawden, Rachel; Haddow, Barry; Birch-Mayne, Alexandra Architecture of a Scalable, Secure and Resilient Translation Platform for Multilingual News Media Inproceedings Rehm, Georg; Bontcheva, Kalina; Choukri, Khalid; Hajic, Jan; Piperidis, Stelios; Vasiljevs, Andrejs (Ed.): Proceedings of the 1st International Workshop on Language Technology Platforms, pp. 16–21, European Language Resources Association (ELRA), 2020, ISBN: 979-10-95546-64-1, (1st International Workshop on Language Technology Platforms, IWLTP 2020 ; Conference date: 16-05-2020 Through 16-05-2020). @inproceedings{9cbaa5080a6841dc9b3b8de334b6d8d2, title = {Architecture of a Scalable, Secure and Resilient Translation Platform for Multilingual News Media}, author = {Susie Coleman and Andrew Secker and Rachel Bawden and Barry Haddow and Alexandra Birch-Mayne}, editor = {Georg Rehm and Kalina Bontcheva and Khalid Choukri and Jan Hajic and Stelios Piperidis and Andrejs Vasiljevs}, url = {https://www.european-language-grid.eu/iwltp-2020/}, isbn = {979-10-95546-64-1}, year = {2020}, date = {2020-05-16}, booktitle = {Proceedings of the 1st International Workshop on Language Technology Platforms}, pages = {16–21}, publisher = {European Language Resources Association (ELRA)}, abstract = {This paper presents an example architecture for a scalable, secure and resilient Machine Translation (MT) platform, using components available via Amazon Web Services (AWS). It is increasingly common for a single news organisation to publish and monitor news sources in multiple languages. A growth in news sources makes this increasingly challenging and time-consuming but MT can help automate some aspects of this process. Building a translation service provides a single integration point for news room tools that use translation technology allowing MT models to be integrated into a system once, rather than each time the translation technology is needed. By using a range of services provided by AWS, it is possible to architect a platform where multiple pre-existing technologies are combined to build a solution, as opposed to developing software from scratch for deployment on a single virtual machine. This increases the speed at which a platform can be developed and allows the use of well-maintained services. However, a single service also provides challenges. It is key to consider how the platform will scale when handling many users and how to ensure the platform is resilient.}, note = {1st International Workshop on Language Technology Platforms, IWLTP 2020 ; Conference date: 16-05-2020 Through 16-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper presents an example architecture for a scalable, secure and resilient Machine Translation (MT) platform, using components available via Amazon Web Services (AWS). It is increasingly common for a single news organisation to publish and monitor news sources in multiple languages. A growth in news sources makes this increasingly challenging and time-consuming but MT can help automate some aspects of this process. Building a translation service provides a single integration point for news room tools that use translation technology allowing MT models to be integrated into a system once, rather than each time the translation technology is needed. By using a range of services provided by AWS, it is possible to architect a platform where multiple pre-existing technologies are combined to build a solution, as opposed to developing software from scratch for deployment on a single virtual machine. This increases the speed at which a platform can be developed and allows the use of well-maintained services. However, a single service also provides challenges. It is key to consider how the platform will scale when handling many users and how to ensure the platform is resilient. |
Bansal, Sameer; Kamper, Herman; Lopez, Adam; Goldwater, Sharon Cross-Lingual Topic Prediction for Speech Using Translations Inproceedings ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8164–8168, Institute of Electrical and Electronics Engineers (IEEE), United States, 2020, ISBN: 978-1-5090-6632-2, (2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020). @inproceedings{e4cd3f75305f41aa9bef2d5dbc860cc8, title = {Cross-Lingual Topic Prediction for Speech Using Translations}, author = {Sameer Bansal and Herman Kamper and Adam Lopez and Sharon Goldwater}, doi = {10.1109/ICASSP40776.2020.9054169}, isbn = {978-1-5090-6632-2}, year = {2020}, date = {2020-05-14}, booktitle = {ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages = {8164--8168}, publisher = {Institute of Electrical and Electronics Engineers (IEEE)}, address = {United States}, abstract = {Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic? We consider this question in the setting where a small amount of speech in the low-resource language is paired with text translations in a high-resource language. We develop an effective cross-lingual topic classifier by training on just 20 hours of translated speech, using a recent model for direct speech-to-text translation. While the translations are poor, they are still good enough to correctly classify the topic of 1-minute speech segments over 70% of the time—a 20% improvement over a majority-class baseline. Such a system could be useful for humanitarian applications like crisis response, where incoming speech in a foreign low-resource language must be quickly assessed for further action.}, note = {2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic? We consider this question in the setting where a small amount of speech in the low-resource language is paired with text translations in a high-resource language. We develop an effective cross-lingual topic classifier by training on just 20 hours of translated speech, using a recent model for direct speech-to-text translation. While the translations are poor, they are still good enough to correctly classify the topic of 1-minute speech segments over 70% of the time—a 20% improvement over a majority-class baseline. Such a system could be useful for humanitarian applications like crisis response, where incoming speech in a foreign low-resource language must be quickly assessed for further action. |
Mihaela C, ; Bansal, Sameer; Goldwater, Sharon Analyzing ASR Pretraining for Low-Resource Speech-to-Text Translation Inproceedings ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7909–7913, Institute of Electrical and Electronics Engineers (IEEE), United States, 2020, ISBN: 978-1-5090-6632-2, (2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020). @inproceedings{e1441c27e014416384f198baaed2ea26, title = {Analyzing ASR Pretraining for Low-Resource Speech-to-Text Translation}, author = {{Mihaela C } Stoian and Sameer Bansal and Sharon Goldwater}, doi = {10.1109/ICASSP40776.2020.9053847}, isbn = {978-1-5090-6632-2}, year = {2020}, date = {2020-05-14}, booktitle = {ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages = {7909--7913}, publisher = {Institute of Electrical and Electronics Engineers (IEEE)}, address = {United States}, abstract = {Previous work has shown that for low-resource source languages, automatic speech-to-text translation (AST) can be improved by pretraining an end-to-end model on automatic speech recognition (ASR) data from a high-resource language. However, it is not clear what factors—e.g., language relatedness or size of the pretraining data—yield the biggest improvements, or whether pretraining can be effectively combined with other methods such as data augmentation. Here, we experiment with pretraining on datasets of varying sizes, including languages related and unrelated to the AST source language. We find that the best predictor of final AST performance is the word error rate of the pretrained ASR model, and that differences in ASR/AST performance correlate with how phonetic information is encoded in the later RNN layers of our model. We also show that pretraining and data augmentation yield complementary benefits for AST.}, note = {2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Previous work has shown that for low-resource source languages, automatic speech-to-text translation (AST) can be improved by pretraining an end-to-end model on automatic speech recognition (ASR) data from a high-resource language. However, it is not clear what factors—e.g., language relatedness or size of the pretraining data—yield the biggest improvements, or whether pretraining can be effectively combined with other methods such as data augmentation. Here, we experiment with pretraining on datasets of varying sizes, including languages related and unrelated to the AST source language. We find that the best predictor of final AST performance is the word error rate of the pretrained ASR model, and that differences in ASR/AST performance correlate with how phonetic information is encoded in the later RNN layers of our model. We also show that pretraining and data augmentation yield complementary benefits for AST. |
Kamper, Herman; Matusevych, Yevgen; Goldwater, Sharon Multilingual Acoustic Word Embedding Models for Processing Zero-Resource Languages Inproceedings ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6414–6418, Institute of Electrical and Electronics Engineers (IEEE), United States, 2020, ISBN: 978-1-5090-6632-2, (2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020). @inproceedings{bb79f7c94bd843f89648326839919406, title = {Multilingual Acoustic Word Embedding Models for Processing Zero-Resource Languages}, author = {Herman Kamper and Yevgen Matusevych and Sharon Goldwater}, doi = {10.1109/ICASSP40776.2020.9054202}, isbn = {978-1-5090-6632-2}, year = {2020}, date = {2020-05-14}, booktitle = {ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages = {6414--6418}, publisher = {Institute of Electrical and Electronics Engineers (IEEE)}, address = {United States}, abstract = {Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in “zero-resource” speech search, indexing and discovery systems. Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zeroresource languages. For this transfer learning approach, we consider two multilingual recurrent neural network models: a discriminative classifier trained on the joint vocabularies of all training languages, and a correspondence autoencoder trained to reconstruct word pairs. We test these using a word discrimination task on six target zero-resource languages. When trained on seven well-resourced languages, both models perform similarly and outperform unsupervised models trained on the zero-resource languages. With just a single training language, the second model works better, but performance depends more on the particular training–testing language pair.}, note = {2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in “zero-resource” speech search, indexing and discovery systems. Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zeroresource languages. For this transfer learning approach, we consider two multilingual recurrent neural network models: a discriminative classifier trained on the joint vocabularies of all training languages, and a correspondence autoencoder trained to reconstruct word pairs. We test these using a word discrimination task on six target zero-resource languages. When trained on seven well-resourced languages, both models perform similarly and outperform unsupervised models trained on the zero-resource languages. With just a single training language, the second model works better, but performance depends more on the particular training–testing language pair. |
Abu Farha, Ibrahim ; Magdy, Walid Multitask Learning for Arabic Offensive Language and Hate-Speech Detection Inproceedings Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 86, European Language Resources Association (ELRA), 2020, (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020). @inproceedings{fb2f963314064b81aa0d0adc41d32fee, title = {Multitask Learning for Arabic Offensive Language and Hate-Speech Detection}, author = {Ibrahim {Abu Farha} and Walid Magdy}, url = {http://edinburghnlp.inf.ed.ac.uk/workshops/OSACT4/}, year = {2020}, date = {2020-05-12}, booktitle = {Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools}, pages = {86}, publisher = {European Language Resources Association (ELRA)}, abstract = {Offensive language and hate-speech are phenomena that spread with the rising popularity of social media. Detecting such content is crucial for understanding and predicting conflicts, understanding polarisation among communities and providing means and tools to filter or block inappropriate content. This paper describes the SMASH team submission to OSACT4textquoterights shared task on hate-speech and offensive language detection, where we explore different approaches to perform these tasks. The experiments cover a variety of approaches that include deep learning, transfer learning and multitask learning. We also explore the utilisation of sentiment information to perform the previous task. Our best model is a multitask learning architecture, based on CNN-BiLSTM, that was trained to detect hate-speech and offensive language and predict sentiment.}, note = {The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Offensive language and hate-speech are phenomena that spread with the rising popularity of social media. Detecting such content is crucial for understanding and predicting conflicts, understanding polarisation among communities and providing means and tools to filter or block inappropriate content. This paper describes the SMASH team submission to OSACT4textquoterights shared task on hate-speech and offensive language detection, where we explore different approaches to perform these tasks. The experiments cover a variety of approaches that include deep learning, transfer learning and multitask learning. We also explore the utilisation of sentiment information to perform the previous task. Our best model is a multitask learning architecture, based on CNN-BiLSTM, that was trained to detect hate-speech and offensive language and predict sentiment. |
Mubarak, Hamdy; Darwish, Kareem; Magdy, Walid; Elsayed, Tamer; Al-Khalifa, Hend Overview of OSACT4 Arabic Offensive Language Detection Shared Task Inproceedings Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 48–52, European Language Resources Association (ELRA), 2020, (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020). @inproceedings{61d93e1d580c4a19a0442b7f67033eba, title = {Overview of OSACT4 Arabic Offensive Language Detection Shared Task}, author = {Hamdy Mubarak and Kareem Darwish and Walid Magdy and Tamer Elsayed and Hend Al-Khalifa}, url = {http://edinburghnlp.inf.ed.ac.uk/workshops/OSACT4/}, year = {2020}, date = {2020-05-12}, booktitle = {Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools}, pages = {48--52}, publisher = {European Language Resources Association (ELRA)}, abstract = {This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4). There were two subtasks, namely: Subtask A, involving the detection of offensive language, which contains unacceptable or vulgar content in addition to any kind of explicit or implicit insults or attacks against individuals or groups; and Subtask B, involving the detection of hate speech, which contains insults or threats targeting a group based on their nationality, ethnicity, race, gender, political or sport affiliation, religious belief, or other common characteristics. In total, 40 teams signed up to participate in Subtask A, and 14 of them submitted test runs. For Subtask B, 33 teams signed up to participate and 13 of them submitted runs. We present and analyze all submissions in this paper.}, note = {The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper provides an overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4). There were two subtasks, namely: Subtask A, involving the detection of offensive language, which contains unacceptable or vulgar content in addition to any kind of explicit or implicit insults or attacks against individuals or groups; and Subtask B, involving the detection of hate speech, which contains insults or threats targeting a group based on their nationality, ethnicity, race, gender, political or sport affiliation, religious belief, or other common characteristics. In total, 40 teams signed up to participate in Subtask A, and 14 of them submitted test runs. For Subtask B, 33 teams signed up to participate and 13 of them submitted runs. We present and analyze all submissions in this paper. |
Abu Farha, Ibrahim ; Magdy, Walid From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset Inproceedings Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 32–39, European Language Resources Association (ELRA), 2020, (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020). @inproceedings{057e9c8406364b9b98e9ad45d3422b05, title = {From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset}, author = {Ibrahim {Abu Farha} and Walid Magdy}, url = {http://edinburghnlp.inf.ed.ac.uk/workshops/OSACT4/}, year = {2020}, date = {2020-05-12}, booktitle = {Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools}, pages = {32--39}, publisher = {European Language Resources Association (ELRA)}, abstract = {Sarcasm is one of the main challenges for sentiment analysis systems. Its complexity comes from the expression of opinion using implicit indirect phrasing. In this paper, we present ArSarcasm, an Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. The dataset contains 10,547 tweets, 16% of which are sarcastic. In addition to sarcasm the data was annotated for sentiment and dialects. Our analysis shows the highly subjective nature of these tasks, which is demonstrated by the shift in sentiment labels based on annotatorstextquoteright biases. Experiments show the degradation of state-of-the-art sentiment analysers when faced with sarcastic content. Finally, we train a deep learning model for sarcasm detection using BiLSTM. The model achieves an F1-score of 0.46, which shows the challenging nature of the task, and should act as a basic baseline for future research on our dataset.}, note = {The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Sarcasm is one of the main challenges for sentiment analysis systems. Its complexity comes from the expression of opinion using implicit indirect phrasing. In this paper, we present ArSarcasm, an Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. The dataset contains 10,547 tweets, 16% of which are sarcastic. In addition to sarcasm the data was annotated for sentiment and dialects. Our analysis shows the highly subjective nature of these tasks, which is demonstrated by the shift in sentiment labels based on annotatorstextquoteright biases. Experiments show the degradation of state-of-the-art sentiment analysers when faced with sarcastic content. Finally, we train a deep learning model for sarcasm detection using BiLSTM. The model achieves an F1-score of 0.46, which shows the challenging nature of the task, and should act as a basic baseline for future research on our dataset. |
António V, ; Farajian, Amin} {M; Bawden, Rachel; Zhang, Michael; André F T, Document-level Neural MT: A Systematic Comparison Inproceedings Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pp. 225–234, European Association for Machine Translation, 2020, (22nd Annual Conference of the European Association for Machine Translation, EAMT 2020 ; Conference date: 03-11-2020 Through 05-11-2020). @inproceedings{79a87a8b833846dd869d0283f8db0b2c, title = {Document-level Neural MT: A Systematic Comparison}, author = {{António V } Lopes and Amin} {M Farajian and Rachel Bawden and Michael Zhang and {André F T } Martins}, url = {https://eamt2020.inesc-id.pt/}, year = {2020}, date = {2020-05-06}, booktitle = {Proceedings of the 22nd Annual Conference of the European Association for Machine Translation}, pages = {225–234}, publisher = {European Association for Machine Translation}, abstract = {In this paper we provide a systematic comparison of existing and new document-level neural machine translation solutions. As part of this comparison, we introduce and evaluate a document-level variant of the recently proposed Star Transformer architecture. In addition to using the traditional metric BLEU, we report the accuracy of the models in handling anaphoric pronoun translation as well as coherence and cohesion using contrastive test sets. Finally, we report the results of human evaluation in terms of Multidimensional Quality Metrics (MQM) and analyse the correlation of the results obtained by the automatic metrics with human judgments.}, note = {22nd Annual Conference of the European Association for Machine Translation, EAMT 2020 ; Conference date: 03-11-2020 Through 05-11-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In this paper we provide a systematic comparison of existing and new document-level neural machine translation solutions. As part of this comparison, we introduce and evaluate a document-level variant of the recently proposed Star Transformer architecture. In addition to using the traditional metric BLEU, we report the accuracy of the models in handling anaphoric pronoun translation as well as coherence and cohesion using contrastive test sets. Finally, we report the results of human evaluation in terms of Multidimensional Quality Metrics (MQM) and analyse the correlation of the results obtained by the automatic metrics with human judgments. |
Li, Ruolan; Schatz, Thomas; Matusevych, Yevgen; Goldwater, Sharon; Naomi H, Input matters in the modeling of early phonetic learning Inproceedings Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society 2020, 2020, (42nd Annual Virtual Meeting of the Cognitive Science Society, CogSci 2020 ; Conference date: 29-07-2020 Through 01-08-2020). @inproceedings{6479303ea2864de4bbedf7d80a47d22b, title = {Input matters in the modeling of early phonetic learning}, author = {Ruolan Li and Thomas Schatz and Yevgen Matusevych and Sharon Goldwater and {Naomi H } Feldman}, url = {https://cognitivesciencesociety.org/cogsci-2020/}, year = {2020}, date = {2020-04-30}, booktitle = {Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society 2020}, abstract = {In acquiring language, differences in input can greatly affect learning outcomes, but which aspects of language learning are most sensitive to input variations, and which are robust, remains debated. A recent modeling study successfully reproduced a phenomenon empirically observed in early phonetic learning—learning about the sounds of the native language in the first year of life—despite using input that differed in quantity and speaker composition from what a typical infant would hear. In this paper, we carry out a direct test of that modeltextquoterights robustness to input variations. We find that, despite what the original result suggested, the learning outcomes are sensitive to properties of the input and that more plausible input leads to a better fit with empirical observations. This has implications for understanding early phonetic learning in infants and underscores the importance of using realistic input in models of language acquisition.}, note = {42nd Annual Virtual Meeting of the Cognitive Science Society, CogSci 2020 ; Conference date: 29-07-2020 Through 01-08-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In acquiring language, differences in input can greatly affect learning outcomes, but which aspects of language learning are most sensitive to input variations, and which are robust, remains debated. A recent modeling study successfully reproduced a phenomenon empirically observed in early phonetic learning—learning about the sounds of the native language in the first year of life—despite using input that differed in quantity and speaker composition from what a typical infant would hear. In this paper, we carry out a direct test of that modeltextquoterights robustness to input variations. We find that, despite what the original result suggested, the learning outcomes are sensitive to properties of the input and that more plausible input leads to a better fit with empirical observations. This has implications for understanding early phonetic learning in infants and underscores the importance of using realistic input in models of language acquisition. |
Schloeder, Julian; Lascarides, Alex Understanding Focus: Pitch, Placement and Coherence Journal Article Semantics and Pragmatics, 13 , pp. 1–48, 2020, ISSN: 1937-8912. @article{f1c955e41d5842d8b52e90c7f4b923b6, title = {Understanding Focus: Pitch, Placement and Coherence}, author = {Julian Schloeder and Alex Lascarides}, doi = {10.3765/sp.13.1}, issn = {1937-8912}, year = {2020}, date = {2020-04-30}, journal = {Semantics and Pragmatics}, volume = {13}, pages = {1--48}, abstract = {This paper presents a novel account of focal stress and pitch contour in English dialogue. We argue that one should analyse and treat focus and pitch contour jointly, since (i) some pragmatic interpretations vary with contour (e.g., whether an utterance accepts or rejects; or whether it implicates a positive or negative answer); and (ii) there are utterances with identical prosodic focus that in the same context are infelicitous with one contour, but felicitous with another. We offer an account of two distinct pitch contours that predicts the correct felicity judgements and implicatures, outclassing other models in empirical coverage or formality. Prosodic focus triggers a presupposition, where what is presupposed and how the presupposition is resolved depends on prosodic contour. If resolving the presupposition entails the proffered content, then the proffered content is uninteresting and hence the utterance is infelicitous. Otherwise, resolving the presupposition may lead to an implicature. We regiment this account in SDRT.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This paper presents a novel account of focal stress and pitch contour in English dialogue. We argue that one should analyse and treat focus and pitch contour jointly, since (i) some pragmatic interpretations vary with contour (e.g., whether an utterance accepts or rejects; or whether it implicates a positive or negative answer); and (ii) there are utterances with identical prosodic focus that in the same context are infelicitous with one contour, but felicitous with another. We offer an account of two distinct pitch contours that predicts the correct felicity judgements and implicatures, outclassing other models in empirical coverage or formality. Prosodic focus triggers a presupposition, where what is presupposed and how the presupposition is resolved depends on prosodic contour. If resolving the presupposition entails the proffered content, then the proffered content is uninteresting and hence the utterance is infelicitous. Otherwise, resolving the presupposition may lead to an implicature. We regiment this account in SDRT. |
Ataman, Duygu; Aziz, Wilker; Birch, Alexandra A Latent Morphology Model for Open-Vocabulary Neural Machine Translation Inproceedings Proceedings of the International Conference on Learning Representations 2020, pp. 1–15, 2020, (Eighth International Conference on Learning Representations, ICLR 2020 ; Conference date: 26-04-2020 Through 30-04-2020). @inproceedings{648c66d1adac42ecb900411c1118cc02, title = {A Latent Morphology Model for Open-Vocabulary Neural Machine Translation}, author = {Duygu Ataman and Wilker Aziz and Alexandra Birch}, url = {https://iclr.cc/Conferences/2020}, year = {2020}, date = {2020-04-30}, booktitle = {Proceedings of the International Conference on Learning Representations 2020}, pages = {1--15}, abstract = {Translation into morphologically-rich languages challenges neural machinetranslation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter learns directly from translation data but requires rather deep architectures. In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. Our model generates words one character at a time by composing two latent representations: a continuous one, aimed at capturing the lexical semantics, and a set of (approximately) discrete features, aimed at capturing the morphosyn-tactic function, which are shared among different surface forms. Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings.}, note = {Eighth International Conference on Learning Representations, ICLR 2020 ; Conference date: 26-04-2020 Through 30-04-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Translation into morphologically-rich languages challenges neural machinetranslation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter learns directly from translation data but requires rather deep architectures. In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. Our model generates words one character at a time by composing two latent representations: a continuous one, aimed at capturing the lexical semantics, and a set of (approximately) discrete features, aimed at capturing the morphosyn-tactic function, which are shared among different surface forms. Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings. |
Matusevych, Yevgen; Schatz, Thomas; Kamper, Herman; Naomi H, ; Goldwater, Sharon Evaluating computational models of infant phonetic learning across languages Inproceedings Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society 2020, 2020, (42nd Annual Virtual Meeting of the Cognitive Science Society, CogSci 2020 ; Conference date: 29-07-2020 Through 01-08-2020). @inproceedings{b8646a51f77f48a09cfe745ca195e4ab, title = {Evaluating computational models of infant phonetic learning across languages}, author = {Yevgen Matusevych and Thomas Schatz and Herman Kamper and {Naomi H } Feldman and Sharon Goldwater}, url = {https://cognitivesciencesociety.org/cogsci-2020/}, year = {2020}, date = {2020-04-30}, booktitle = {Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society 2020}, abstract = {In the first year of life, infantstextquoteright speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infantstextquoteright discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infantstextquoteright perceptual development.}, note = {42nd Annual Virtual Meeting of the Cognitive Science Society, CogSci 2020 ; Conference date: 29-07-2020 Through 01-08-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In the first year of life, infantstextquoteright speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infantstextquoteright discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infantstextquoteright perceptual development. |
Lucia Lushi, ; Magdy, Walid; Wolters, Maria The Effect of User Psychology on the Content of Social Media Posts: Originality and Transitions Matter Journal Article Frontiers in Psychology, 11 , 2020, ISSN: 1664-1078. @article{d724593d28104db5bfe82ecee5f5aab8, title = {The Effect of User Psychology on the Content of Social Media Posts: Originality and Transitions Matter}, author = {{Lucia Lushi} Chen and Walid Magdy and Maria Wolters}, doi = {10.3389/fpsyg.2020.00526}, issn = {1664-1078}, year = {2020}, date = {2020-04-21}, journal = {Frontiers in Psychology}, volume = {11}, publisher = {Frontiers Media S.A.}, abstract = {Multiple studies suggest that frequencies of affective words in social media text are associated with the usertextquoterights personality and mental health. In this study, we re-examine these associations by looking at the transition patterns of affect.We analyzed the content originality and affect polarity of 4,086 posts from 70 adult Facebook users contributed over 2 months. We studied posting behavior, including silent periods when the user does not post any content. Our results show that more extroverted participants tend to post positive content continuously and that more agreeable participants tend to avoid posting negative content. We also observe that participants with stronger depression symptoms posted more non-original content. We recommend that transitions of affect pattern derived from social media text and content originality should be considered in further studies on mental health, personality, and social media.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Multiple studies suggest that frequencies of affective words in social media text are associated with the usertextquoterights personality and mental health. In this study, we re-examine these associations by looking at the transition patterns of affect.We analyzed the content originality and affect polarity of 4,086 posts from 70 adult Facebook users contributed over 2 months. We studied posting behavior, including silent periods when the user does not post any content. Our results show that more extroverted participants tend to post positive content continuously and that more agreeable participants tend to avoid posting negative content. We also observe that participants with stronger depression symptoms posted more non-original content. We recommend that transitions of affect pattern derived from social media text and content originality should be considered in further studies on mental health, personality, and social media. |
Robertson, Alexander; Magdy, Walid; Goldwater, Sharon Emoji Skin Tone Modifiers: Analyzing Variation in Usage on Social Media Journal Article ACM Transactions on Social Computing (TSC), 3 (2), 2020, ISSN: 2469-7818. @article{c20ec05c0a534712bcc2e79a9a55a6b8, title = {Emoji Skin Tone Modifiers: Analyzing Variation in Usage on Social Media}, author = {Alexander Robertson and Walid Magdy and Sharon Goldwater}, doi = {10.1145/3396115}, issn = {2469-7818}, year = {2020}, date = {2020-04-19}, journal = {ACM Transactions on Social Computing (TSC)}, volume = {3}, number = {2}, publisher = {Association for Computing Machinery (ACM)}, abstract = {Emoji are widely used in computer-mediated communication to express concepts and emotions. Skin tone modifiers were added in 2015 with the hope of better representing user diversity, and indeed recent work has shown that these modifiers are especially popular amongst darker-skinned users, who are a minority on Twitter. Previous work also showed that the vast majority of tone-modified emoji have a tone similar to the usertextquoterights own skin tone, suggesting that self representation is a major factor in tone use. In this paper, we first show that the basic finding (users mainly choose a tone that is similar to their own skin tone) generalizes to different sub-populations of users, including users from majority-Black regions. We then extend the analysis of tone use to quantify and examine cases where users modulate their tones: that is, for a particular emoji, they choose either to use a different tone than their usual one, or no tone at all (after having previously used one). We show that even though these uses constitute only a small proportion of emoji usage, many instances are readily classifiable as ways of representing other people. The evidence we present is therefore crucial inworking towards a broader understanding of the connection between emoji and identity expression online.We also offer explanations for why the darkest emoji skin tones are not used, by examining aspects of their design which make them less suited to self-representational usage. This highlights the need for careful consideration of both design and human diversity when creating emoji. Moreover, despite early fears in the media, we find little evidence of negative usage even when tones are used in a non-self-representational manner. In sum, our findings lend even more support to the highly positive role that emoji and skin tone modifiers play in identity and expression in computer-mediated communication.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Emoji are widely used in computer-mediated communication to express concepts and emotions. Skin tone modifiers were added in 2015 with the hope of better representing user diversity, and indeed recent work has shown that these modifiers are especially popular amongst darker-skinned users, who are a minority on Twitter. Previous work also showed that the vast majority of tone-modified emoji have a tone similar to the usertextquoterights own skin tone, suggesting that self representation is a major factor in tone use. In this paper, we first show that the basic finding (users mainly choose a tone that is similar to their own skin tone) generalizes to different sub-populations of users, including users from majority-Black regions. We then extend the analysis of tone use to quantify and examine cases where users modulate their tones: that is, for a particular emoji, they choose either to use a different tone than their usual one, or no tone at all (after having previously used one). We show that even though these uses constitute only a small proportion of emoji usage, many instances are readily classifiable as ways of representing other people. The evidence we present is therefore crucial inworking towards a broader understanding of the connection between emoji and identity expression online.We also offer explanations for why the darkest emoji skin tones are not used, by examining aspects of their design which make them less suited to self-representational usage. This highlights the need for careful consideration of both design and human diversity when creating emoji. Moreover, despite early fears in the media, we find little evidence of negative usage even when tones are used in a non-self-representational manner. In sum, our findings lend even more support to the highly positive role that emoji and skin tone modifiers play in identity and expression in computer-mediated communication. |
Hermann, Enno; Kamper, Herman; Goldwater, Sharon Multilingual and Unsupervised Subword Modelingfor Zero-Resource Languages Journal Article Computer Speech and Language, 65 , 2020, ISSN: 0885-2308. @article{1d8cc9024e6b4d6c85a868a9ae3d068d, title = {Multilingual and Unsupervised Subword Modelingfor Zero-Resource Languages}, author = {Enno Hermann and Herman Kamper and Sharon Goldwater}, doi = {10.1016/j.csl.2020.101098}, issn = {0885-2308}, year = {2020}, date = {2020-04-17}, journal = {Computer Speech and Language}, volume = {65}, publisher = {Academic Press Inc.}, abstract = {Subword modeling for zero-resource languages aims to learn low-level representations of speech audio without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in this area has primarily focused unsupervised learning from target language data only, and has been evaluated only intrinsically. Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task. We find that combining two existing target-language-only methods yields better features than either method alone. Nevertheless, even better results are obtained by extracting target language bottleneck features using a model trained on other languages. Cross-lingual training using just one other language is enough to provide this benefit, but multilingual training helps even more. In addition to these results, which hold across both intrinsic measures and the extrinsic task, we discuss the qualitative differences between the different types of learned features.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Subword modeling for zero-resource languages aims to learn low-level representations of speech audio without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in this area has primarily focused unsupervised learning from target language data only, and has been evaluated only intrinsically. Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task. We find that combining two existing target-language-only methods yields better features than either method alone. Nevertheless, even better results are obtained by extracting target language bottleneck features using a model trained on other languages. Cross-lingual training using just one other language is enough to provide this benefit, but multilingual training helps even more. In addition to these results, which hold across both intrinsic measures and the extrinsic task, we discuss the qualitative differences between the different types of learned features. |
v Stanojevi s, Milo; Hale, John; Steedman, Mark Predictive Processing of Coordination in CCG Inproceedings Proceedings of the 33rd Annual CUNY Conference on Human Sentence Processing, 2020. @inproceedings{d6fd454bd05c425299bf6fc92a26eb42, title = {Predictive Processing of Coordination in CCG}, author = {Milo{v Stanojevi s} 'c and John Hale and Mark Steedman}, year = {2020}, date = {2020-03-21}, booktitle = {Proceedings of the 33rd Annual CUNY Conference on Human Sentence Processing}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Oprea, Silviu; Magdy, Walid The Effect of Sociocultural Variables on Sarcasm Communication Online Inproceedings Proceedings of the ACM on Computer-Supported Cooperative Work and Social Computing, ACM Association for Computing Machinery, 2020, (The 23rd ACM Conference on Computer-Supported Cooperative Work and Social Computing, CSCW 2020 ; Conference date: 17-10-2020 Through 21-10-2020). @inproceedings{05c04d344ab94cd982b47711e6e34a86, title = {The Effect of Sociocultural Variables on Sarcasm Communication Online}, author = {Silviu Oprea and Walid Magdy}, url = {https://cscw.acm.org/2020/}, year = {2020}, date = {2020-03-11}, booktitle = {Proceedings of the ACM on Computer-Supported Cooperative Work and Social Computing}, volume = {4}, publisher = {ACM Association for Computing Machinery}, abstract = {Online social networks (OSN) play an essential role for connecting people and allowing them to communicate online. OSN users share their thoughts, moments, and news with their network. The messages they share online can include sarcastic posts, where the intended meaning expressed by the written text is different from the literal one. This could result in miscommunication. Previous research in psycholinguistics has studied the sociocultural factors the might lead to sarcasm misunderstanding between speakers and listeners. However, there is a lack of such studies in the context of OSN. In this paper we fill this gap by performing a quantitative analysis on the influence of sociocultural variables, including gender, age, country, and English language nativeness, on the effectiveness of sarcastic communication online. We collect examples of sarcastic tweets directly from the authors who posted them. Further, we ask third-party annotators of different sociocultural backgrounds to label these tweets for sarcasm. Our analysis indicates that age, English language nativeness, and country are significantly influential and should be considered in the design of future social analysis tools that either study sarcasm directly, or look at related phenomena where sarcasm may have an influence. We also make observations about the social ecology surrounding sarcastic exchanges on OSNs. We conclude by suggesting ways in which our findings can be included in future work.}, note = {The 23rd ACM Conference on Computer-Supported Cooperative Work and Social Computing, CSCW 2020 ; Conference date: 17-10-2020 Through 21-10-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Online social networks (OSN) play an essential role for connecting people and allowing them to communicate online. OSN users share their thoughts, moments, and news with their network. The messages they share online can include sarcastic posts, where the intended meaning expressed by the written text is different from the literal one. This could result in miscommunication. Previous research in psycholinguistics has studied the sociocultural factors the might lead to sarcasm misunderstanding between speakers and listeners. However, there is a lack of such studies in the context of OSN. In this paper we fill this gap by performing a quantitative analysis on the influence of sociocultural variables, including gender, age, country, and English language nativeness, on the effectiveness of sarcastic communication online. We collect examples of sarcastic tweets directly from the authors who posted them. Further, we ask third-party annotators of different sociocultural backgrounds to label these tweets for sarcasm. Our analysis indicates that age, English language nativeness, and country are significantly influential and should be considered in the design of future social analysis tools that either study sarcasm directly, or look at related phenomena where sarcasm may have an influence. We also make observations about the social ecology surrounding sarcastic exchanges on OSNs. We conclude by suggesting ways in which our findings can be included in future work. |
Sykes, Dominic; Grivas, Andreas; Grover, Claire; Tobin, RICHARD; Sudlow, Catherine; Whiteley, William; McIntosh, Andrew; Whalley, Heather; Alex, Beatrice Comparison of Rule-based and Neural Network Models for Negation Detection in Radiology Reports Journal Article Natural Language Engineering, 2020, ISSN: 1351-3249. @article{abfc8d8396724fc8ba698187e40dc604, title = {Comparison of Rule-based and Neural Network Models for Negation Detection in Radiology Reports}, author = {Dominic Sykes and Andreas Grivas and Claire Grover and RICHARD Tobin and Catherine Sudlow and William Whiteley and Andrew McIntosh and Heather Whalley and Beatrice Alex}, issn = {1351-3249}, year = {2020}, date = {2020-03-09}, journal = {Natural Language Engineering}, publisher = {Cambridge University Press}, abstract = {Using natural language processing it is possible to extract structured information from raw text in the Electronic Health Record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult.We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, 1and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study, and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans.These models are compared with two existing rule-based models; pyConText [Harkema et al., 2009], a python implementation of a generalisation of NegEx, and NegBio [Penget al., 2017], which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods.EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data was used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size.The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models}, keywords = {}, pubstate = {published}, tppubtype = {article} } Using natural language processing it is possible to extract structured information from raw text in the Electronic Health Record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult.We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, 1and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study, and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans.These models are compared with two existing rule-based models; pyConText [Harkema et al., 2009], a python implementation of a generalisation of NegEx, and NegBio [Penget al., 2017], which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods.EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data was used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size.The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models |
Llewellyn, Clare; Orzechowski, Pawel; Alex, Beatrice Teaching a Text Mining Bootcamp in Lockdown Technical Report 2020. @techreport{abf3541925424df9b32da07bdf849f9f, title = {Teaching a Text Mining Bootcamp in Lockdown}, author = {Clare Llewellyn and Pawel Orzechowski and Beatrice Alex}, year = {2020}, date = {2020-01-01}, pages = {1--7}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } |
Dobreva, Radina; Zhou, Jie; Bawden, Rachel Document Sub-structure in Neural Machine Translation Inproceedings Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), pp. 3657–3667, European Language Resources Association (ELRA), 2020, (12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020). @inproceedings{e86a066bd4244e50afe6bf631ac668a4, title = {Document Sub-structure in Neural Machine Translation}, author = {Radina Dobreva and Jie Zhou and Rachel Bawden}, url = {https://lrec2020.lrec-conf.org/en/}, year = {2020}, date = {2020-01-01}, booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020)}, pages = {3657–3667}, publisher = {European Language Resources Association (ELRA)}, abstract = {Current approaches to machine translation (MT) either translate sentences in isolation, disregarding the context they appear in, or model context at the level of the full document, without a notion of any internal structure the document may have. In this work we consider the fact that documents are rarely homogeneous blocks of text, but rather consist of parts covering different topics. Some documents, such as biographies and encyclopedia entries, have highly predictable, regular structures in which sections are characterised by different topics. We draw inspiration from Louis and Webber (2014) who use this information to improve statistical MT and transfer their proposal into the framework of neural MT. We compare two different methods of including information about the topic of the section within which each sentence is found: one using side constraints and the other using a cache-based model. We create and release the data on which we run our experiments – parallel corpora for three language pairs (Chinese-English, French-English, Bulgarian-English) from Wikipedia biographies, which we extract automatically, preserving the boundaries of sections within the articles.}, note = {12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Current approaches to machine translation (MT) either translate sentences in isolation, disregarding the context they appear in, or model context at the level of the full document, without a notion of any internal structure the document may have. In this work we consider the fact that documents are rarely homogeneous blocks of text, but rather consist of parts covering different topics. Some documents, such as biographies and encyclopedia entries, have highly predictable, regular structures in which sections are characterised by different topics. We draw inspiration from Louis and Webber (2014) who use this information to improve statistical MT and transfer their proposal into the framework of neural MT. We compare two different methods of including information about the topic of the section within which each sentence is found: one using side constraints and the other using a cache-based model. We create and release the data on which we run our experiments – parallel corpora for three language pairs (Chinese-English, French-English, Bulgarian-English) from Wikipedia biographies, which we extract automatically, preserving the boundaries of sections within the articles. |
McCurdy, Kate; Goldwater, Sharon; Lopez, Adam Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020, pp. 1745–1756, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). @inproceedings{3602db79fe624bc985ad1c2906953800, title = {Inflecting when theretextquoterights no majority: Limitations of encoder-decoder neural networks as cognitive models for German plurals}, author = {Kate McCurdy and Sharon Goldwater and Adam Lopez}, url = {https://acl2020.org/}, doi = {10.18653/v1/2020.acl-main.159}, year = {2020}, date = {2020-01-01}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020}, pages = {1745–1756}, publisher = {Association for Computational Linguistics (ACL)}, abstract = {Can artificial neural networks learn to represent inflectional morphology and generalize to new words as human speakers do? Kirov and Cotterell (2018) argue that the answer is yes: modern Encoder-Decoder (ED) architectures learn human-like behavior when inflecting English verbs, such as extending the regular past tense form /-(e)d/ to novel words. However, their work does not address the criticism raised by Marcus et al. (1995): that neural models may learn to extend not the regular, but the most frequent class — and thus fail on tasks like German number inflection, where infrequent suffixes like /-s/ can still be productively generalized. To investigate this question, we first collect a new dataset from German speakers (production and ratings of plural forms for novel nouns) that is designed to avoid sources of information unavailable to the ED model. The speaker data show high variability, and two suffixes evince textquoteleftregulartextquoteright behavior, appearing more often with phonologically atypical inputs. Encoder-decoder models do generalize the most frequently produced plural class, but do not show human-like variability or textquoteleftregulartextquoteright extension of these other plural markers. We conclude that modern neural models may still struggle with minority-class generalization.}, note = {2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Can artificial neural networks learn to represent inflectional morphology and generalize to new words as human speakers do? Kirov and Cotterell (2018) argue that the answer is yes: modern Encoder-Decoder (ED) architectures learn human-like behavior when inflecting English verbs, such as extending the regular past tense form /-(e)d/ to novel words. However, their work does not address the criticism raised by Marcus et al. (1995): that neural models may learn to extend not the regular, but the most frequent class — and thus fail on tasks like German number inflection, where infrequent suffixes like /-s/ can still be productively generalized. To investigate this question, we first collect a new dataset from German speakers (production and ratings of plural forms for novel nouns) that is designed to avoid sources of information unavailable to the ED model. The speaker data show high variability, and two suffixes evince textquoteleftregulartextquoteright behavior, appearing more often with phonologically atypical inputs. Encoder-decoder models do generalize the most frequently produced plural class, but do not show human-like variability or textquoteleftregulartextquoteright extension of these other plural markers. We conclude that modern neural models may still struggle with minority-class generalization. |
Matusevych, Yevgen; Kamper, Herman; Goldwater, Sharon Analyzing Autoencoder-Based Acoustic Word Embeddings Conference 2020, (Bridging AI and Cognitive Science Workshop @ ICLR 2020, BAICS 2020 ; Conference date: 26-04-2020 Through 26-04-2020). @conference{f0ec16b23121419aa9fa315dae41e38d, title = {Analyzing Autoencoder-Based Acoustic Word Embeddings}, author = {Yevgen Matusevych and Herman Kamper and Sharon Goldwater}, url = {https://baicsworkshop.github.io/}, year = {2020}, date = {2020-01-01}, pages = {1--6}, abstract = {Recent studies have introduced methods for learning acoustic word embeddings (AWEs)—fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about wordstextquoteright absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between wordstextquoteright embeddings increases with those wordstextquoteright phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory.}, note = {Bridging AI and Cognitive Science Workshop @ ICLR 2020, BAICS 2020 ; Conference date: 26-04-2020 Through 26-04-2020}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Recent studies have introduced methods for learning acoustic word embeddings (AWEs)—fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about wordstextquoteright absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between wordstextquoteright embeddings increases with those wordstextquoteright phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory. |
Steedman, Mark A Formal Universal of Natural Language Grammar Journal Article Language, 96 , 2020, ISSN: 0097-8507. @article{782c1b3c38834d58a998a309d871a7f8, title = {A Formal Universal of Natural Language Grammar}, author = {Mark Steedman}, issn = {0097-8507}, year = {2020}, date = {2020-01-01}, journal = {Language}, volume = {96}, publisher = {Linguistic Society of America}, abstract = {This paper proposes that the possible word-orders for any natural language construction composed of n elements, each of which selects for the category headed by the next, are universally limited both across and within languages to a subclass of permutations on the “universal order of command” 1, . . . n, as determined by their selectional restrictions. The permitted subclass is known as the “separable” permutations, and grows in n as the Large Schröder Series 1, 2, 6, 22, 90, 394, 1806, . . . .This universal is identified as formal because it follows directly from the assumptions of Combinatory Categorial Grammar (CCG)—in particular, from the fact that all CCG syntactic rules are subject to a Combinatory Projection Principle that limits them to binary rules applying to contiguous non-empty categories.The paper presents quantitative empirical evidence in support of this claim from the linguistically attested orders of the four elements Dem(onstrative), Num(erator), A(djective), N(oun) that have been examined in connection with various versions of Greenbergtextquoterights putative 20th Universal concerning their order. A universal restriction to separable permutation is also supported by word-order variation in the Germanic verb cluster, and in the Hungarian verb complex, among other constructions.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This paper proposes that the possible word-orders for any natural language construction composed of n elements, each of which selects for the category headed by the next, are universally limited both across and within languages to a subclass of permutations on the “universal order of command” 1, . . . n, as determined by their selectional restrictions. The permitted subclass is known as the “separable” permutations, and grows in n as the Large Schröder Series 1, 2, 6, 22, 90, 394, 1806, . . . .This universal is identified as formal because it follows directly from the assumptions of Combinatory Categorial Grammar (CCG)—in particular, from the fact that all CCG syntactic rules are subject to a Combinatory Projection Principle that limits them to binary rules applying to contiguous non-empty categories.The paper presents quantitative empirical evidence in support of this claim from the linguistically attested orders of the four elements Dem(onstrative), Num(erator), A(djective), N(oun) that have been examined in connection with various versions of Greenbergtextquoterights putative 20th Universal concerning their order. A universal restriction to separable permutation is also supported by word-order variation in the Germanic verb cluster, and in the Hungarian verb complex, among other constructions. |
2019 |
Zhang, Biao; Sennrich, Rico Root Mean Square Layer Normalization Inproceedings Advances in Neural Information Processing Systems 32, pp. 12360–12371, Curran Associates Inc, 2019, (33rd Conference on Neural Information Processing Systems, NeurIPS 2019 ; Conference date: 08-12-2019 Through 14-12-2019). @inproceedings{04753de48bb74071980fe19fa1df572c, title = {Root Mean Square Layer Normalization}, author = {Biao Zhang and Rico Sennrich}, url = {https://neurips.cc/}, year = {2019}, date = {2019-12-14}, booktitle = {Advances in Neural Information Processing Systems 32}, volume = {32}, pages = {12360--12371}, publisher = {Curran Associates Inc}, abstract = {Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of the summed inputs without breaking the above properties. Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%∼64% on different models.}, note = {33rd Conference on Neural Information Processing Systems, NeurIPS 2019 ; Conference date: 08-12-2019 Through 14-12-2019}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inputs to a neuron in one layer according to root mean square (RMS), giving the model re-scaling invariance property and implicit learning rate adaptation ability. RMSNorm is computationally simpler and thus more efficient than LayerNorm. We also present partial RMSNorm, or pRMSNorm where the RMS is estimated from p% of the summed inputs without breaking the above properties. Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%∼64% on different models. |
Mourad, Ahmed; Scholer, Falk; Magdy, Walid; Sanderson, Mark A Practical Guide for the Effective Evaluation of Twitter User Geolocation Journal Article ACM Transactions on Social Computing (TSC), 2 (3), 2019, ISSN: 2469-7818. @article{c09adc4e16e142348b590f77d7bef639b, title = {A Practical Guide for the Effective Evaluation of Twitter User Geolocation}, author = {Ahmed Mourad and Falk Scholer and Walid Magdy and Mark Sanderson}, doi = {10.1145/3352572}, issn = {2469-7818}, year = {2019}, date = {2019-12-07}, journal = {ACM Transactions on Social Computing (TSC)}, volume = {2}, number = {3}, publisher = {Association for Computing Machinery (ACM)}, abstract = {Geolocating Twitter users---the task of identifying their home locations---serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this paper, we propose a guide for a standardized evaluation of Twitter user geolocation by analyzing fifteen models and two baselines in a controlled experimental setting. Models are evaluated using ten metrics over four geographic granularities. We use rank correlations to assess the effectiveness of these metrics. Our results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. We show that for general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Given the global geographic coverage of this task, we specifically recommend evaluation at micro versus macro levels to measure the impact of the bias in distribution over locations. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. We propose a suite of statistical analysis tests, based on the employed metric, to ensure that the results are not coincidental.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Geolocating Twitter users---the task of identifying their home locations---serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this paper, we propose a guide for a standardized evaluation of Twitter user geolocation by analyzing fifteen models and two baselines in a controlled experimental setting. Models are evaluated using ten metrics over four geographic granularities. We use rank correlations to assess the effectiveness of these metrics. Our results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. We show that for general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Given the global geographic coverage of this task, we specifically recommend evaluation at micro versus macro levels to measure the impact of the bias in distribution over locations. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. We propose a suite of statistical analysis tests, based on the employed metric, to ensure that the results are not coincidental. |
Alex, Beatrice; Grover, Claire; Tobin, Richard; Sudlow, Catherine; Mair, Grant; Whiteley, William Text Mining Brain Imaging Reports Journal Article Journal of Biomedical Semantics, 10 , 2019, ISSN: 2041-1480. @article{74ee8f4642e94ae1b07aa99d5a0c1e70b, title = {Text Mining Brain Imaging Reports}, author = {Beatrice Alex and Claire Grover and Richard Tobin and Catherine Sudlow and Grant Mair and William Whiteley}, doi = {10.1186/s13326-019-0211-7}, issn = {2041-1480}, year = {2019}, date = {2019-11-12}, journal = {Journal of Biomedical Semantics}, volume = {10}, publisher = {BioMed Central}, abstract = {Background: With the improvements to text mining technology and the availability of large unstructured Electronic Healthcare Records (EHR) datasets, it is now possible to extract structured information from raw text contained within EHR at reasonably high accuracy. We describe a text mining system for classifying radiologiststextquoteright reports of CT and MRI brain scans, assigning labels indicating occurrence and type of stroke, as well as other observations. Our system, the Edinburgh Information Extraction for Radiology reports (EdIE-R) system, which we describe here, was developed and tested on a collection of radiology reports. Methods: The work reported in this paper is based on 1,168 radiology reports from the Edinburgh Stroke Study (ESS) [1], a hospital-based register of stroke and transient ischaemic attack patients. We created manually annotations for this data in parallel with developing the rule-based EdIE-R system to identify phenotype information related to stroke in radiology reports. This process was iterative and domain expert feedback was considered at each iteration to adapt and tune the EdIE-R text mining system which identifies entities, negation and relations between entities in each report and determines report-level labels. Results:The inter-annotator agreement (IAA) for all types of annotations is high at 96.96 for entities, 96.46 for negation, 95.84 for relations and 94.02 for labels. The equivalent system scores on the blind test set are equally high at 95.49 for entities, 94.41 for negation, 98.27 for relations and 96.39 for labels for the first annotator and 96.86, 96.01, 96.53 and 92.61, respectively for the second annotator. Conclusion: Automated reading of such EHR data at such high levels of accuracies opens up avenues for population health monitoring and audit, and can provide a resource for epidemiological studies. We are in the process of validating EdIE-R in separate larger cohorts in NHS England and Scotland. The manually annotated ESS corpus will be available for research purposes on application.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Background: With the improvements to text mining technology and the availability of large unstructured Electronic Healthcare Records (EHR) datasets, it is now possible to extract structured information from raw text contained within EHR at reasonably high accuracy. We describe a text mining system for classifying radiologiststextquoteright reports of CT and MRI brain scans, assigning labels indicating occurrence and type of stroke, as well as other observations. Our system, the Edinburgh Information Extraction for Radiology reports (EdIE-R) system, which we describe here, was developed and tested on a collection of radiology reports. Methods: The work reported in this paper is based on 1,168 radiology reports from the Edinburgh Stroke Study (ESS) [1], a hospital-based register of stroke and transient ischaemic attack patients. We created manually annotations for this data in parallel with developing the rule-based EdIE-R system to identify phenotype information related to stroke in radiology reports. This process was iterative and domain expert feedback was considered at each iteration to adapt and tune the EdIE-R text mining system which identifies entities, negation and relations between entities in each report and determines report-level labels. Results:The inter-annotator agreement (IAA) for all types of annotations is high at 96.96 for entities, 96.46 for negation, 95.84 for relations and 94.02 for labels. The equivalent system scores on the blind test set are equally high at 95.49 for entities, 94.41 for negation, 98.27 for relations and 96.39 for labels for the first annotator and 96.86, 96.01, 96.53 and 92.61, respectively for the second annotator. Conclusion: Automated reading of such EHR data at such high levels of accuracies opens up avenues for population health monitoring and audit, and can provide a resource for epidemiological studies. We are in the process of validating EdIE-R in separate larger cohorts in NHS England and Scotland. The manually annotated ESS corpus will be available for research purposes on application. |
Shen, Yiting; Wilson, Steven; Mihalcea, Rada Measuring Personal Values in Cross-Cultural User-Generated Content. Inproceedings International Conference on Social Informatics, pp. 143–155, Springer, Cham, 2019, ISBN: 978-3-030-34970-7, (11th International Conference on Social Informatics, SocInfo 2019 ; Conference date: 18-11-2019 Through 21-11-2019). @inproceedings{c30320b9b2af479e93e0b3c27ceae78b, title = {Measuring Personal Values in Cross-Cultural User-Generated Content.}, author = {Yiting Shen and Steven Wilson and Rada Mihalcea}, url = {https://socinfo2019.qcri.org/}, doi = {10.1007/978-3-030-34971-4_10}, isbn = {978-3-030-34970-7}, year = {2019}, date = {2019-11-11}, booktitle = {International Conference on Social Informatics}, pages = {143--155}, publisher = {Springer, Cham}, series = {Lecture Notes in Computer Science (LNCS)}, abstract = {There are several standard methods used to measure personal values, including the Schwartz Values Survey and the World Values Survey. While these tools are based on well-established questionnaires, they are expensive to administer at a large scale and rely on respondents to self-report their values rather than observing what people actually choose to write about. We employ a lexicon-based method that can computationally measure personal values on a large scale. Our approach is not limited to word-counting as we explore and evaluate several alternative approaches to quantifying the usage of value-related themes in a given document. We apply our methodology to a large blog dataset comprised of text written by users from different countries around the world in order to quantify cultural differences in the expression of person values on blogs. Additionally, we analyze the relationship between the value themes expressed in blog posts and the values measured for some of the same countries using the World Values Survey.}, note = {11th International Conference on Social Informatics, SocInfo 2019 ; Conference date: 18-11-2019 Through 21-11-2019}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } There are several standard methods used to measure personal values, including the Schwartz Values Survey and the World Values Survey. While these tools are based on well-established questionnaires, they are expensive to administer at a large scale and rely on respondents to self-report their values rather than observing what people actually choose to write about. We employ a lexicon-based method that can computationally measure personal values on a large scale. Our approach is not limited to word-counting as we explore and evaluate several alternative approaches to quantifying the usage of value-related themes in a given document. We apply our methodology to a large blog dataset comprised of text written by users from different countries around the world in order to quantify cultural differences in the expression of person values on blogs. Additionally, we analyze the relationship between the value themes expressed in blog posts and the values measured for some of the same countries using the World Values Survey. |
Aldayel, Abeer; Magdy, Walid Assessing Sentiment of the Expressed Stance on Social Media Inproceedings Social Informatics, Springer, Cham, 2019, ISBN: 978-3-030-34970-7, (11th International Conference on Social Informatics, SocInfo 2019 ; Conference date: 18-11-2019 Through 21-11-2019). @inproceedings{50bb512e36e0414ebae4febc50a2c079b, title = {Assessing Sentiment of the Expressed Stance on Social Media}, author = {Abeer Aldayel and Walid Magdy}, url = {https://socinfo2019.qcri.org/}, doi = {10.1007/978-3-030-34971-4_19}, isbn = {978-3-030-34970-7}, year = {2019}, date = {2019-11-11}, booktitle = {Social Informatics}, number = {11864}, publisher = {Springer, Cham}, series = {Lecture Notes in Computer Sciences (LNCS)}, abstract = {Stance detection is the task of inferring viewpoint towards a given topic or entity either being supportive or opposing. One may express a viewpoint towards a topic by using positive or negative language. This paper examines how the stance is being expressed in social media according to the sentiment polarity. There has been a noticeable misconception of the similarity between the stance and sentiment when it comes to viewpoint discovery, where negative sentiment is assumed to mean against stance, and positive sentiment means in-favour stance. To analyze the relation between stance and sentiment, we construct a new dataset with four topics and examine how people express their viewpoint with regards these topics. We validate our results by carrying a further analysis of the popular stance benchmark SemEval stance dataset. Our analyses reveal that sentiment and stance are not highly aligned, and hence the simple sentiment polarity cannot be used solely to denote a stance toward a given topic.}, note = {11th International Conference on Social Informatics, SocInfo 2019 ; Conference date: 18-11-2019 Through 21-11-2019}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Stance detection is the task of inferring viewpoint towards a given topic or entity either being supportive or opposing. One may express a viewpoint towards a topic by using positive or negative language. This paper examines how the stance is being expressed in social media according to the sentiment polarity. There has been a noticeable misconception of the similarity between the stance and sentiment when it comes to viewpoint discovery, where negative sentiment is assumed to mean against stance, and positive sentiment means in-favour stance. To analyze the relation between stance and sentiment, we construct a new dataset with four topics and examine how people express their viewpoint with regards these topics. We validate our results by carrying a further analysis of the popular stance benchmark SemEval stance dataset. Our analyses reveal that sentiment and stance are not highly aligned, and hence the simple sentiment polarity cannot be used solely to denote a stance toward a given topic. |
2020 |
Screenplay Summarization Using Latent Narrative Structure Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1920–1933, Association for Computational Linguistics, 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
Parallel Sentence Mining by Constrained Decoding Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1672–1678, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1628–1639, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
ParaCrawl: Web-Scale Acquisition of Parallel Corpora Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4555–4567, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3544–3552, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
Edinburghtextquoterights Submissions to the 2020 Machine Translation Efficiency Task Inproceedings Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 218–224, Association for Computational Linguistics (ACL), 2020, (The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020). |
Modelling Suspense in Short Stories as Uncertainty Reduction over Neural Representation Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1763–1788, Association for Computational Linguistics, 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
Compressing Neural Machine Translation Models with 4-bit Precision Inproceedings Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 35–42, Association for Computational Linguistics (ACL), 2020, (The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020). |
In Neural Machine Translation, What Does Transfer Learning Transfer? Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7701–7710, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
Findings of the Fourth Workshop on Neural Generation and Translation Inproceedings Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 1–9, Association for Computational Linguistics (ACL), 2020, (The 4th Workshop on Neural Generation and Translation, WNGT 2020 ; Conference date: 10-07-2020 Through 10-07-2020). |
Max-Margin Incremental CCG Parsing Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4111–4122, Association for Computational Linguistics, 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
iSarcasm: A Dataset of Intended Sarcasm Inproceedings Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1279–1289, Association for Computational Linguistics (ACL), 2020, ISBN: 978-1-952148-25-5, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
Span-Based LCFRS-2 Parsing Inproceedings Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies, pp. 111–121, Association for Computational Linguistics, 2020, ISBN: 978-1-952148-11-8, (The 16th Workshop Conference on Parsing Technologies, IWPT 2020 ; Conference date: 09-07-2020 Through 09-07-2020). |
Examining the Role of Mood Patterns in Predicting Self-reported Depressive Symptoms Inproceedings WebSci '20: 12th ACM Conference on Web Science, pp. 164–173, Association for Computing Machinery (ACM), United States, 2020, (12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020). |
Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity Inproceedings WebSci '20: 12th ACM Conference on Web Science, pp. 155–163, Association for Computing Machinery (ACM), United States, 2020, (12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020). |
Hahackathon: Incorporating Demographic Factors into Shared Humor Tasks Inproceedings Proceedings of the International Workshop on Semantic Evaluation 2020, 2020, (International Workshop on Semantic Evaluation 2020, SemEval 2020 ; Conference date: 12-12-2020 Through 13-12-2020). |
X-stance: A Multilingual Multi-Target Dataset for Stance Detection Inproceedings Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), CEUR-WS.org, 2020, (5th SwissText & 16th KONVENS Joint Conference 2020, SwissText and KONVENS 2020 ; Conference date: 23-06-2020 Through 25-06-2020). |
Analysing Privacy Leakage of Life Events on Twitter Inproceedings WebSci '20: 12th ACM Conference on Web Science, pp. 287–294, Association for Computing Machinery (ACM), United States, 2020, (12th ACM Web Science Conference 2020, WebSci 2020 ; Conference date: 07-07-2020 Through 10-07-2020). |
Towards Using Word Embedding Vector Space for Better Cohort Analysis Inproceedings Proceedings of the International AAAI Conference on Web and Social Media, pp. 919–923, AAAI Press, 2020, (14th International Conference on Web and Social Media, ICWSM 2020 ; Conference date: 08-06-2020 Through 11-06-2020). |
Small Town or Metropolis? Analyzing the Relationship between Population Size and Language Inproceedings Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 6287–6291, European Language Resources Association (ELRA), 2020, (12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020). |
Urban Dictionary Embeddings for Slang NLP Applications Inproceedings Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 4764–4773, European Language Resources Association (ELRA), 2020, (12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020). |
Geoparsing the Historical Gazetteers of Scotland: Accurately Computing Location in Mass Digitised Texts Inproceedings Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pp. 24–30, European Language Resources Association (ELRA), 2020, (8th Workshop on the Challenges in the Management of Large Corpora, CMLC-8 ; Conference date: 16-05-2020 Through 16-05-2020). |
Architecture of a Scalable, Secure and Resilient Translation Platform for Multilingual News Media Inproceedings Rehm, Georg; Bontcheva, Kalina; Choukri, Khalid; Hajic, Jan; Piperidis, Stelios; Vasiljevs, Andrejs (Ed.): Proceedings of the 1st International Workshop on Language Technology Platforms, pp. 16–21, European Language Resources Association (ELRA), 2020, ISBN: 979-10-95546-64-1, (1st International Workshop on Language Technology Platforms, IWLTP 2020 ; Conference date: 16-05-2020 Through 16-05-2020). |
Cross-Lingual Topic Prediction for Speech Using Translations Inproceedings ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8164–8168, Institute of Electrical and Electronics Engineers (IEEE), United States, 2020, ISBN: 978-1-5090-6632-2, (2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020). |
Analyzing ASR Pretraining for Low-Resource Speech-to-Text Translation Inproceedings ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7909–7913, Institute of Electrical and Electronics Engineers (IEEE), United States, 2020, ISBN: 978-1-5090-6632-2, (2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020). |
Multilingual Acoustic Word Embedding Models for Processing Zero-Resource Languages Inproceedings ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6414–6418, Institute of Electrical and Electronics Engineers (IEEE), United States, 2020, ISBN: 978-1-5090-6632-2, (2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020). |
Multitask Learning for Arabic Offensive Language and Hate-Speech Detection Inproceedings Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 86, European Language Resources Association (ELRA), 2020, (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020). |
Overview of OSACT4 Arabic Offensive Language Detection Shared Task Inproceedings Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 48–52, European Language Resources Association (ELRA), 2020, (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020). |
From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset Inproceedings Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 32–39, European Language Resources Association (ELRA), 2020, (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT4 ; Conference date: 12-05-2020 Through 12-05-2020). |
Document-level Neural MT: A Systematic Comparison Inproceedings Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pp. 225–234, European Association for Machine Translation, 2020, (22nd Annual Conference of the European Association for Machine Translation, EAMT 2020 ; Conference date: 03-11-2020 Through 05-11-2020). |
Input matters in the modeling of early phonetic learning Inproceedings Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society 2020, 2020, (42nd Annual Virtual Meeting of the Cognitive Science Society, CogSci 2020 ; Conference date: 29-07-2020 Through 01-08-2020). |
Understanding Focus: Pitch, Placement and Coherence Journal Article Semantics and Pragmatics, 13 , pp. 1–48, 2020, ISSN: 1937-8912. |
A Latent Morphology Model for Open-Vocabulary Neural Machine Translation Inproceedings Proceedings of the International Conference on Learning Representations 2020, pp. 1–15, 2020, (Eighth International Conference on Learning Representations, ICLR 2020 ; Conference date: 26-04-2020 Through 30-04-2020). |
Evaluating computational models of infant phonetic learning across languages Inproceedings Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society 2020, 2020, (42nd Annual Virtual Meeting of the Cognitive Science Society, CogSci 2020 ; Conference date: 29-07-2020 Through 01-08-2020). |
The Effect of User Psychology on the Content of Social Media Posts: Originality and Transitions Matter Journal Article Frontiers in Psychology, 11 , 2020, ISSN: 1664-1078. |
Emoji Skin Tone Modifiers: Analyzing Variation in Usage on Social Media Journal Article ACM Transactions on Social Computing (TSC), 3 (2), 2020, ISSN: 2469-7818. |
Multilingual and Unsupervised Subword Modelingfor Zero-Resource Languages Journal Article Computer Speech and Language, 65 , 2020, ISSN: 0885-2308. |
Predictive Processing of Coordination in CCG Inproceedings Proceedings of the 33rd Annual CUNY Conference on Human Sentence Processing, 2020. |
The Effect of Sociocultural Variables on Sarcasm Communication Online Inproceedings Proceedings of the ACM on Computer-Supported Cooperative Work and Social Computing, ACM Association for Computing Machinery, 2020, (The 23rd ACM Conference on Computer-Supported Cooperative Work and Social Computing, CSCW 2020 ; Conference date: 17-10-2020 Through 21-10-2020). |
Comparison of Rule-based and Neural Network Models for Negation Detection in Radiology Reports Journal Article Natural Language Engineering, 2020, ISSN: 1351-3249. |
Teaching a Text Mining Bootcamp in Lockdown Technical Report 2020. |
Document Sub-structure in Neural Machine Translation Inproceedings Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), pp. 3657–3667, European Language Resources Association (ELRA), 2020, (12th Language Resources and Evaluation Conference, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020). |
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020, pp. 1745–1756, Association for Computational Linguistics (ACL), 2020, (2020 Annual Conference of the Association for Computational Linguistics, ACL 2020 ; Conference date: 05-07-2020 Through 10-07-2020). |
Analyzing Autoencoder-Based Acoustic Word Embeddings Conference 2020, (Bridging AI and Cognitive Science Workshop @ ICLR 2020, BAICS 2020 ; Conference date: 26-04-2020 Through 26-04-2020). |
A Formal Universal of Natural Language Grammar Journal Article Language, 96 , 2020, ISSN: 0097-8507. |
2019 |
Root Mean Square Layer Normalization Inproceedings Advances in Neural Information Processing Systems 32, pp. 12360–12371, Curran Associates Inc, 2019, (33rd Conference on Neural Information Processing Systems, NeurIPS 2019 ; Conference date: 08-12-2019 Through 14-12-2019). |
A Practical Guide for the Effective Evaluation of Twitter User Geolocation Journal Article ACM Transactions on Social Computing (TSC), 2 (3), 2019, ISSN: 2469-7818. |
Text Mining Brain Imaging Reports Journal Article Journal of Biomedical Semantics, 10 , 2019, ISSN: 2041-1480. |
Measuring Personal Values in Cross-Cultural User-Generated Content. Inproceedings International Conference on Social Informatics, pp. 143–155, Springer, Cham, 2019, ISBN: 978-3-030-34970-7, (11th International Conference on Social Informatics, SocInfo 2019 ; Conference date: 18-11-2019 Through 21-11-2019). |
Assessing Sentiment of the Expressed Stance on Social Media Inproceedings Social Informatics, Springer, Cham, 2019, ISBN: 978-3-030-34970-7, (11th International Conference on Social Informatics, SocInfo 2019 ; Conference date: 18-11-2019 Through 21-11-2019). |