Exploiting structural features
The work of the text analysis network in the first phase was strongly influenced by the Words as Data principle, and the focus was mostly to use and improve statistical methods to work on text. The corresponding methods make very limited use of advanced pre-processing methods from computational linguistics. In particular, there have been no attempts to use the structure of text, both within a sentence as well as between sentences. Such structural features, however, can be very useful for fine-grained analysis of text that goes beyond the classification of a complete document but looks at details inside documents. In the second period, the Political Text Analysis Network will analyse applications of methods that exploit the structure of documents, paragraphs, and sentences to provide more fine-grained insights.
Using background knowledge
Text analysis approaches investigated in the context of the SFB so far have been solely based on text as input. It is well known, however, that the analysis of texts can greatly benefit from the use of background knowledge, if available. In the context of analysing political speeches, for instance, it is very useful to know the speaker, his or her affiliation to a party or other interest group, and the audience. In the course of the project, we will investigate how background knowledge about speakers, interest groups, and their relations can be used to more accurately interpret and organise political texts. For this purpose, a knowledge base of political actors, interest groups, and their relations will be built as a basis for further analyses.
Subjective, vague and personal language
Alongside the extension of the text analysis methods during the second phase, the focus of investigation will be extended from the analysis of official documents to less formal documents including speeches, interviews, and personal statements of political actors. A special focus of the investigations will be the use of subjective language as an indicator for the political position of the speaker. We will specifically look at methods for measuring sentiment, emotion, and degree of vagueness of statements. This task often goes beyond the previously investigated methods for determining the position, and requires structural analysis as well as background knowledge.
Media as sources of political opinion
While the work related to text analysis in the first phase of the SFB has mostly focused on parties as authors of political documents, the second phase will extend the scope to a wider range of textual documents. While traditional media such as newspapers can typically be treated in the same way as official party documents as existing text analysis technology is often trained on newspaper articles, new types of media, i.e. social media pose a new challenge for text analysis methods. In addition to the intense use of subjective language, texts from these sources tend to suffer from poor grammar and spelling mistakes that pose new challenges to text analysis. On the other hand, media such as forums often come with an explicit discourse structure that can be exploited in the analysis.