Übersicht
Geben Sie einen Suchbegriff ein oder verwenden Sie die Erweiterte Suche um nach Autor, Erscheinungsjahr oder Dokumenttyp zu filtern.
-
(2023): Integrating Economic Theory, Domain Knowledge, and Social Knowledge into Hybrid Sentiment Models for Predicting Crude Oil Markets. In: Cognitive Computation, zuletzt geprüft am 31.03.2023
Abstract: For several decades, sentiment analysis has been considered a key indicator for assessing market mood and predicting future price changes. Accurately predicting commodity markets requires an understanding of fundamental market dynamics such as the interplay between supply and demand, which are not considered in standard affective models. This paper introduces two domain-specific affective models, CrudeBERT and CrudeBERT+, that adapt sentiment analysis to the crude oil market by incorporating economic theory with common knowledge of the mentioned entities and social knowledge extracted from Google Trends. To evaluate the predictive capabilities of these models, comprehensive experiments were conducted using dynamic time warping to identify the model that best approximates WTI crude oil futures price movements. The evaluation included news headlines and crude oil prices between January 2012 and April 2021. The results show that CrudeBERT+ outperformed RavenPack, BERT, FinBERT, and early CrudeBERT models during the 9-year evaluation period and within most of the individual years that were analyzed. The success of the introduced domain-specific affective models demonstrates the potential of integrating economic theory with sentiment analysis and external knowledge sources to improve the predictive power of financial sentiment analysis models. The experiments also confirm that CrudeBERT+ has the potential to provide valuable insights for decision-making in the crude oil market.
-
(2023) : Ontologien und Linked Open Data In: Kuhlen, Rainer; Lewandowski, Dirk; Semar, Wolfgang; Womser-Hacker, Christa (Hg.): Grundlagen der Informationswissenschaft: 7., völlig neu gefasste Ausgabe: Berlin: De Gruyter, S. 257-269. Online verfügbar unter https://doi.org/10.1515/9783110769043-022, zuletzt geprüft am 16.12.2022
-
(2022): Compliance-Untersuchungen im Zeitalter von Big Data und künstlicher Intelligenz. In: Compliance-Berater 10. Online verfügbar unter https://www.researchgate.net/publication/361276309_Compliance-Untersuchungen_im_Zeitalter_von_Big_Data_und_kunstlicher_Intelligenz, zuletzt geprüft am 23.06.2022
Abstract: Seit mehr als zwei Jahrzehnten werden IT-gestützte Instrumente bei Compliance-Untersuchungen eingesetzt. Dabei haben sich der Anwendungsbereich und die Methoden im Laufe der Zeit erheblich verändert. Einerseits nimmt die Menge der zu bearbeitenden Dokumente, Daten und Datentypen massiv zu. Andererseits werden die technischen Methoden zur Datenbearbeitung immer leistungsstärker. Aktuell stellt sich die Frage, inwieweit es möglich ist, durch neue Technologien aus dem Bereich Big Data und künstlicher Intelligenz (KI) Automatisierungspotenziale zu heben, mit denen Compliance-Untersuchungen besser, schneller und kostengünstiger durchgeführt werden können. Dieser Beitrag zeigt den aktuellen Stand in der Praxis sowie Entwicklungspotenziale in der nahen Zukunft auf.
-
(2022): Slot Filling for Extracting Reskilling and Upskilling Options from the Web. 27th International Conference on Natural Language & Information Systems (NLDB). Universitat Politècnica de València. Valencia,17. Juni, 2022. Online verfügbar unter https://www.youtube.com/watch?v=rIhhKjJAMnY&t=2608s, zuletzt geprüft am 24.11.2022
-
(2022): Building Knowledge Graphs and Recommender Systems for Suggesting Reskilling and Upskilling Options from the Web. In: Information 13. Online verfügbar unter https://doi.org/10.3390/info13110510, zuletzt geprüft am 24.11.2022
Abstract: As advances in science and technology, crisis, and increased competition impact labor markets, reskilling and upskilling programs emerged to mitigate their effects. Since information on continuing education is highly distributed across websites, choosing career paths and suitable upskilling options is currently considered a challenging and cumbersome task. This article, therefore, introduces a method for building a comprehensive knowledge graph from the education providers’ Web pages. We collect educational programs from 488 providers and leverage entity recognition and entity linking methods in conjunction with contextualization to extract knowledge on entities such as prerequisites, skills, learning objectives, and course content. Slot filling then integrates these entities into an extensive knowledge graph that contains close to 74,000 nodes and over 734,000 edges. A recommender system leverages the created graph, and background knowledge on occupations to provide a career path and upskilling suggestions. Finally, we evaluate the knowledge extraction approach on the CareerCoach 2022 gold standard and draw upon domain experts for judging the career paths and upskilling suggestions provided by the recommender system.
-
(2022) : Career Coach. Automatische Wissensextraktion und Expertensystem für personalisierte Re- und Upskilling Vorschläge In: Forster, Michael; Alt, Sharon; Hanselmann, Marcel; Deflorin, Patricia (Hg.): Digitale Transformation an der Fachhochschule Graubünden: Case Studies aus Forschung und Lehre: Chur: FH Graubünden Verlag, S. 11-18
Abstract: CareerCoach entwickelt Methoden zur automatischen Extraktion von Fortbildungsangeboten. Das System analysiert die Webseiten von Bildungsanbietenden und integriert deren Angebote in einen zentralen Wissensgrafen, der innovative Dienstleistungen wie semantische Suchen und Expertensysteme unterstützt.
-
(2021) : Towards Developing an Integrity Risk Monitor (IRM). A Status Report In: Makowicz, Bartosz: Global Ethics, Compliance & Integrity: Yearbook 2021: Bern: Peter Lang, S. 123-131
Abstract: Risks, which could jeopardize the integrity of a company, are widespread. This holds true for firms located in Switzerland too. According to a recent study by PricewaterhouseCoopers (2018), almost 40 percent of Swiss companies have been affected by illegal and unethical behavior, such as embezzlement, cybercrime, intellectual property infringements, corruption, fraud, money laundering, and anti-competitive agreements. Although the number of cases in Switzerland is relatively low when compared to other countries globally, the financial damage for affected Swiss companies caused by these incidents is nevertheless above the global average.
-
(2021): Integrity Risk Monitor. Chur: FH Graubünden Verlag. Online verfügbar unter https://www.fhgr.ch/fhgr/unternehmerisches-handeln/schweizerisches-institut-fuer-entrepreneurship-sife/projekte/integrity-risk-monitor-irm/, zuletzt geprüft am 17.03.2022
Abstract: Integre Unternehmensführung hat in den vergangenen Jahren national und international an Bedeutung gewonnen. So thematisiert die Wirtschaftspresse immer wieder das Verhalten von Unternehmen, die ihrer unternehmerischen Verantwortung nicht gerecht werden. Zugleich verlangen verschiedene Anspruchsgruppen von den Unternehmen mehr Transparenz bzgl. ihrer Aktivitäten. Dies veranlasst die Unternehmen in ihrer nicht-finanziellen Geschäftsberichterstattung über ihre Bemühungen um integres Geschäftsgebaren in den Bereichen Menschenrechte, Umwelt und Anti-Korruption zu berichten. Im Rahmen des Forschungsprojekts Integry Risk Monitor (IRM) wurden das IRM-Portal und das IRM-Dashboard entwickelt. Hierbei handelt es sich um webbasierte Echtzeit-Monitoring-Instrumente. Das IRM-Portal umfasst Medienbeiträge der letzten 25 Jahre aus unterschiedlichen Quellen. Ferner durchforstet der Algorithmus permanent das World Wide Web und sammelt neue Beiträge aus redaktionellen Medien. Diese können mithilfe des IRM-Dashboards mit verschiedenen Analyse- und Darstellungsmöglichkeiten untersucht und Zusammenhänge, Beteiligte, Sentiments und geografische Hauptregionen ermittelt werden. Zudem wurde im Rahmen des Projektes auch die nicht-finanzielle Geschäftsberichterstattung von Unternehmen untersucht, um Beziehungen zwischen der medialen und nicht-finanziellen Berichterstattung zu analysieren. Die Ergebnisse der Untersuchung machen deutlich, dass sowohl die Medien als auch die analysierten Unternehmen in den letzten 25 Jahren mehr über die Themen Menschenrechte, Umwelt und Korruption berichten, vorderhand jedoch kein direkter linearer Zusammenhang zwischen diesen beiden Formen der Berichterstattung besteht.
-
(2021): Automatic Expansion of Domain-Specific Affective Models for Web Intelligence Applications. In: Cognitive Computation. Online verfügbar unter https://doi.org/10.1007/s12559-021-09839-4, zuletzt geprüft am 18.02.2021
Abstract: Sentic computing relies on well-defined affective models of different complexity—polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation’s strategic positioning goals. Such goals often deviate from the assumptions of standardised affective models. While certain emotions such as Joy and Trust typically represent desirable brand associations, specific communication goals formulated by marketing professionals often go beyond such standard dimensions. For instance, the brand manager of a television show may consider fear or sadness to be desired emotions for its audience. This article introduces expansion techniques for affective models, combining common and commonsense knowledge available in knowledge graphs with language models and affective reasoning, improving coverage and consistency as well as supporting domain-specific interpretations of emotions. An extensive evaluation compares the performance of different expansion techniques: (i) a quantitative evaluation based on the revisited Hourglass of Emotions model to assess performance on complex models that cover multiple affective categories, using manually compiled gold standard data, and (ii) a qualitative evaluation of a domain-specific affective model for television programme brands. The results of these evaluations demonstrate that the introduced techniques support a variety of embeddings and pre-trained models. The paper concludes with a discussion on applying this approach to other scenarios where affective model resources are scarce.
-
(2021): Adapting Data-Driven Research to the Fields of Social Sciences and the Humanities. In: Future Internet 13. Online verfügbar unter doi.org/10.3390/fi13030059, zuletzt geprüft am 18.05.2021
Abstract: Recent developments in the fields of computer science, such as advances in the areas of big data, knowledge extraction, and deep learning, have triggered the application of data-driven research methods to disciplines such as the social sciences and humanities. This article presents a collaborative, interdisciplinary process for adapting data-driven research to research questions within other disciplines, which considers the methodological background required to obtain a significant impact on the target discipline and guides the systematic collection and formalization of domain knowledge, as well as the selection of appropriate data sources and methods for analyzing, visualizing, and interpreting the results. Finally, we present a case study that applies the described process to the domain of communication science by creating approaches that aid domain experts in locating, tracking, analyzing, and, finally, better understanding the dynamics of media criticism. The study clearly demonstrates the potential of the presented method, but also shows that data-driven research approaches require a tighter integration with the methodological framework of the target discipline to really provide a significant impact on the target discipline.
-
(2021): Inscriptis: A Python-based HTML to text conversion library optimized for knowledge extraction from the Web. In: Journal of Open Source Software 6. Online verfügbar unter https://doi.org/10.21105/joss.03557, zuletzt geprüft am 22.10.2021
Abstract: Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium (Huggins et al., 2021). In contrast to existing software packages such as HTML2text (Swartz, 2021), jusText (Belica, 2021) and Lynx (Dickey, 2021), Inscriptis 1. provides a layout-aware conversion of HTML that more closely resembles the rendering obtained from standard Web browsers and, therefore, better preserves the spatial arrangement of text elements. Inscriptis excels in terms of conversion quality, since it correctly converts complex HTML constructs such as nested tables and also interprets a subset of HTML (e.g., align, valign) and CSS (e.g., display, white-space, margin-top, vertical-align, etc.) attributes that determine the text alignment. 2. supports annotation rules, i.e., user-provided mappings that allow for annotating the extracted text based on structural and semantic information encoded in HTML tags and attributes used for controlling structure and layout in the original HTML document. These unique features ensure that downstream knowledge extraction components can operate on accurate text representations, and may even use information on the semantics and structure of the original HTML document, if annotation support has been enabled.
-
(2020) : In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works In: Fernández, Raquel; Linzen, Tal (Hg.): Proceedings of the 24th Conference on Computational Natural Language Learning: CoNLL 2020: Online, 19.-20. November: Stroudsburg, PA, USA: Association for Computational Linguistics, S. 355-364. Online verfügbar unter doi.org/10.18653/v1/2020.conll-1.28, zuletzt geprüft am 21.05.2021
Abstract: Annotation styles express guidelines that direct human annotators in what rules to follow when creating gold standard annotations of text corpora. These guidelines not only shape the gold standards they help create, but also influence the training and evaluation of Named Entity Linking (NEL) tools, since different annotation styles correspond to divergent views on the entities present in the same texts. Such divergence is particularly present in texts from the media domain that contain references to creative works. In this work we present a corpus of 1000 annotated documents selected from the media domain. Each document is presented with multiple gold standard annotations representing various annotation styles. This corpus is used to evaluate a series of Named Entity Linking tools in order to understand the impact of the differences in annotation styles on the reported accuracy when processing highly ambiguous entities such as names of creative works. Relaxed annotation guidelines that include overlap styles lead to better results across all tools.
-
(2020): Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration. In: Information. Wissenschaft & Praxis 71, S. 321-325. Online verfügbar unter https://doi.org/10.1515/iwp-2020-2119, zuletzt geprüft am 30.10.2020
Abstract: Unternehmensbewertungen in der Biotech-Branche, Pharmazie und Medizintechnik stellen eine anspruchsvolle Aufgabe dar, insbesondere bei Berücksichtigung der einzigartigen Risiken, denen Biotech-Startups beim Eintritt in neue Märkte ausgesetzt sind. Unternehmen, die auf globale Bewertungsdienstleistungen spezialisiert sind, kombinieren daher Bewertungsmodelle und Erfahrungen aus der Vergangenheit mit heterogenen Metriken und Indikatoren, die Einblicke in die Leistung eines Unternehmens geben. Dieser Beitrag veranschaulicht, wie automatisierte Wissensidentifikation, -extraktion und -integration genutzt werden können, um (i) zusätzliche Indikatoren zu ermitteln, die Einblicke in den Erfolg eines Unternehmens in der Produktentwicklung geben und um (ii) arbeitsintensive Datensammelprozesse zur Unternehmensbewertung zu unterstützen.
-
(2020): Improving Company Valuations with Automated Knowledge Discovery, Extraction and Fusion. English translation of the article: "Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration". Information - Wissenschaft und Praxis 71 (5-6):321-325. Online verfügbar unter https://arxiv.org/abs/2010.09249, zuletzt geprüft am 18.05.2021
Abstract: Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into a company's performance. This paper illustrates how automated knowledge discovery, extraction and data fusion can be used to (i) obtain additional indicators that provide insights into the success of a company's product development efforts, and (ii) support labor-intensive data curation processes. We apply deep web knowledge acquisition methods to identify and harvest data on clinical trials that is hidden behind proprietary search interfaces and integrate the extracted data into the industry partner's company valuation ontology. In addition, focused Web crawls and shallow semantic parsing yield information on the company's key personnel and respective contact data, notifying domain experts of relevant changes that get then incorporated into the industry partner's company data.
-
(2020) : Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence In: Chbeir, Richard; Manolopoulos, Yannis; Akerkar, Rajendra; Mizera-Pietraszko, Jolanta (Hg.): Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics: WIMS 2020: Biarritz, France, 30. Juni - 3. Juli: New York, NY, USA: Association for Computing Machinery (ACM), S. 54-62. Online verfügbar unter doi.org/10.1145/3405962.3405988, zuletzt geprüft am 21.05.2021
Abstract: A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context.
-
(2020) : Harvest: An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums: The 20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology: A Hybrid Conference with both Online and Offline Modes: Melbourne, Australia, 14.-17. Dezember
Abstract: Web forums discuss topics of long-term, persisting involvements in domains such as health, mobile software development and online gaming, some of which are of high interest from a research and business perspective. In the medical domain, for example, forums contain information on symptoms, drug side effects and patient discussions that are highly relevant for patient-focused healthcare and drug development. Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more general case of content extraction from web pages and lack key features important for understanding forum content such as the identification of author metadata and information on the thread structure. This paper, therefore, presents a method that determines the XPath of forum posts, eliminating incorrect mergers and splits of the extracted posts that were common in systems from the previous generation. Based on the individual posts further metadata such as authors, forum URL and structure are extracted. We evaluate our approach by creating a gold standard which contains 102 forum pages from 52 different Web forums, and benchmarking against a baseline and competing tools.
-
(2019) : Introducing orbis. An extendable evaluation pipeline for named entity linking performance drill‐down analyses In: Blake, Catherine; Brown, Cecelia (Hg.): 82nd Annual Meeting of The Association for Information Science: Proceedings, 56: ASIS&T 2019: Melbourne, Australia, 19.-23. Oktober: Somerset, NJ, USA: John Wiley & Sons, Ltd, S. 468-471. Online verfügbar unter doi.org/10.1002/pra2.49, zuletzt geprüft am 21.05.2021
Abstract: Most current evaluation tools are focused solely on benchmarking and comparative evaluations thus only provide aggregated statistics such as precision, recall and F1-measure to assess overall system performance. They do not offer comprehensive analyses up to the level of individual annotations. This paper introduces Orbis, an extendable evaluation pipeline framework developed to allow visual drill-down analyses of individual entities, computed by annotation services, in the context of the text they appear in, in reference to the entities specified in the gold standard.
-
(2019): Datenakquiseprozesse mittels Big Data optimieren (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2019.pdf, zuletzt geprüft am 09.04.2021
Abstract: Im Rahmen des DISCOVER-Projekts werden Methoden für die automatische Datenakquise, die Extraktion und Integration von entscheidungsrelevanter Information aus heterogenen Onlinequellen entwickelt, welche auch in der Lage sind, Inhalte aus dem Deep Web zu analysieren.
-
(2019) : Name Variants for Improving Entity Discovery and Linking In: Eskevich, Maria; Melo, Gerard de; Fäth, Christian; McCrae, John P.; Buitelaar, Paul; Chiarcos, Christian; Klimek, Bettina; Dojchinovski, Milan (Hg.): 2nd Conference onLanguage, Data and Knowledge: LDK 2019: Leipzig, 20.-23. Mai: Saarbrücken/Wadern: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing (OASIcs), S. 14:1-14:15. Online verfügbar unter https://doi.org/10.4230/OASIcs.LDK.2019.14, zuletzt geprüft am 21.05.2021
Abstract: Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content. This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements.
-
(2019) : Improving Named Entity Linking Corpora Quality In: Angelova, Galia; Mitkov, Ruslan; Nikolova, Ivelina; Temnikova, Irina (Hg.): Natural Language Processing in a Deep Learning World: Proceedings: International Conference Recent Advances in Natural Language Processing (RANLP 2019): Varna, Bulgaria, 2.-4. September: Bulgaria: Ltd., Shoumen, S. 1328-1337. Online verfügbar unter https://doi.org/10.26615/978-954-452-056-4_152, zuletzt geprüft am 21.05.2021
Abstract: Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.
-
(2019): Capturing, analyzing and visualizing user generated content from social media. 27th Conference on Intelligent Systems for Molecular Biology (ISMB); 17th European Conference on Computational Biology (ECCB); Special session on Social media mining for drug discovery research: challenges and opportunities of Real World Text. Basel, 21.-25. Juni, 2019
Abstract: Source format variability and noise are major challenges when harvesting content from social media. This presentation discusses methods and abstractions for gathering user generated content from Web pages and social media platforms covering (i) structured content, (ii) platforms that leverage Semantic Web standard such as Microformats, RDFa and JSON-LD, and (iii) semi-structured or even unstructured content that is typically found in Web forums. We then discuss pre-processing and anonymization tasks and outline how the collected content is annotated, aggregated and summarized in a so called contextualized information space. An interactive dashboard provides efficient means for analyzing, browsing and visualizing this information space. The dashboard supports analysts in identifying emerging trends and topics, exploring the lexical, geospatial and relational context of topics and entities such as health conditions, diseases, symptoms and drugs, and performing drill-down analysis to shed light on individual posts and statements that cause the observed effects.
-
(2018) : StoryLens: A Multiple Views Corpus for Location and Event Detection In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 30: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter doi.org/10.1145/3227609.3227674, zuletzt geprüft am 21.05.2021
Abstract: The news media landscape tends to focus on long-running narratives. Correctly processing new information, therefore, requires considering multiple lenses when analyzing media content. Traditionally it would have been considered sufficient to extract the topics or entities contained in a text in order to classify it, but today it is important to also look at more sophisticated annotations related to fine-grained geolocation, events, stories and the relations between them. In order to leverage such lenses we propose a new corpus that offers a diverse set of annotations over texts collected from multiple media sources. We also showcase the framework used for creating the corpus, as well as how the information from the various lenses can be used in order to support different use cases in the EU project InVID for verifying the veracity of online video.
-
(2018) : Framing Named Entity Linking Error Types In: Calzolari, Nicoletta; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Hasida, Koiti; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios; Tokunaga, Takenobu (Hg.): Eleventh International Conference on Language Resources and Evaluation: Conference Proceedings. Unter Mitarbeit von Sara Goggi und Hélène Mazo: LREC '18: Miyazaki, Japan, 7.-12. Mai: Paris: European Language Resources Association (ELRA), S. 266-271. Online verfügbar unter https://www.aclweb.org/anthology/L18-1040/, zuletzt geprüft am 21.05.2021
Abstract: Named Entity Linking (NEL) and relation extraction forms the backbone of Knowledge Base Population tasks. The recent rise of large open source Knowledge Bases and the continuous focus on improving NEL performance has led to the creation of automated benchmark solutions during the last decade. The benchmarking of NEL systems offers a valuable approach to understand a NEL system’s performance quantitatively. However, an in-depth qualitative analysis that helps improving NEL methods by identifying error causes usually requires a more thorough error analysis. This paper proposes a taxonomy to frame common errors and applies this taxonomy in a survey study to assess the performance of four well-known Named Entity Linking systems on three recent gold standards.
-
(2018): On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance. SEMANTiCS 2018: 14th International Conference on Semantic Systems. In: Procedia Computer Science 137, S. 33-42. Online verfügbar unter https://doi.org/10.1016/j.procs.2018.09.004, zuletzt geprüft am 21.05.2021
Abstract: Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations. This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools. We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i) improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources.
-
(2018): Optimierung von Karriere- und Recruitingprozessen mittels Web Analytics und künstlicher Intelligenz (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2018.pdf, zuletzt geprüft am 09.04.2021
-
(2018) : Mining and Leveraging Background Knowledge for Improving Named Entity Linking In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 27: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter doi.org/10.1145/3227609.3227670, zuletzt geprüft am 21.05.2021
Abstract: Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.
-
(2018): Optimizing Information Acquisition and Decision Making Processes with Natural Language Processing, Machine Learning and Visual Analytics. 3rd SwissText Analytics Conference. Winterthur, 12.-13. Juni, 2018. Online verfügbar unter https://youtu.be/YicWN1rEn7M, zuletzt geprüft am 28.05.2021
-
(2017) : Torpedo: Improving the State-of-the-Art RDF Dataset Slicing: 11th International Conference on Semantic Computing: ICSC: San Diego, CA, USA, 30. Januar - 1. Februar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 149-156. Online verfügbar unter https://doi.org/10.1109/ICSC.2017.79, zuletzt geprüft am 21.05.2021
Abstract: Over the last years, the amount of data published as Linked Data on the Web has grown enormously. In spite of the high availability of Linked Data, organizations still encounter an accessibility challenge while consuming it. This is mostly due to the large size of some of the datasets published as Linked Data. The core observation behind this work is that a subset of these datasets suffices to address the needs of most organizations. In this paper, we introduce Torpedo, an approach for efficiently selecting and extracting relevant subsets from RDF datasets. In particular, Torpedo adds optimization techniques to reduce seek operations costs as well as the support of multi-join graph patterns and SPARQL FILTERs that enable to perform a more granular data selection. We compare the performance of our approach with existing solutions on nine different queries against four datasets. Our results show that our approach is highly scalable and is up to 26% faster than the current state-of-the-art RDF dataset slicing approach.
-
(2017): Semantic Systems and Visual Tools to Support Environmental Communication. In: IEEE Systems Journal 11, S. 762-771. Online verfügbar unter https://doi.org/10.1109/JSYST.2015.2466439, zuletzt geprüft am 24.07.2020
Abstract: Given the intense attention that environmental topics such as climate change attract in news and social media coverage, scientists and communication professionals want to know how different stakeholders perceive observable threats and policy options, how specific media channels react to new insights, and how journalists present scientific knowledge to the public. This paper investigates the potential of semantic technologies to address these questions. After summarizing methods to extract and disambiguate context information, we present visualization techniques to explore the lexical, geospatial, and relational context of topics and entities referenced in these repositories. The examples stem from the Media Watch on Climate Change, the Climate Resilience Toolkit and the NOAA Media Watch-three applications that aggregate environmental resources from a wide range of online sources. These systems not only show the value of providing comprehensive information to the public, but also have helped to develop a novel communication success metric that goes beyond bipolar assessments of sentiment.
-
(2017): Aspect-Based Extraction and Analysis of Affective Knowledge from Social Media Streams. In: IEEE Intelligent Systems 32, S. 80-88. Online verfügbar unter doi.org/10.1109/MIS.2017.57, zuletzt geprüft am 18.05.2021
Abstract: Extracting and analyzing affective knowledge from social media in a structured manner is a challenging task. Decision makers require insights into the public perception of a company's products and services, as a strategic feedback channel to guide communication campaigns, and as an early warning system to quickly react in the case of unforeseen events. The approach presented in this article goes beyond bipolar metrics of sentiment. It combines factual and affective knowledge extracted from rich public knowledge bases to analyze emotions expressed toward specific entities (targets) in social media. The authors obtain common and common-sense domain knowledge from DBpedia and ConceptNet to identify potential sentiment targets. They employ affective knowledge about emotional categories available from SenticNet to assess how those targets and their aspects (such as specific product features) are perceived in social media. An evaluation shows the usefulness and correctness of the extracted domain knowledge, which is used in a proof-of-concept data analytics application to investigate the perception of car brands on social media in the period between September and November 2015.
-
(2017) : Mitigating linked data quality issues in knowledge-intense information extraction methods In: Akerkar, Rajendra; Cuzzocrea, Alfredo; Cao, Jannong; Hacid, Mohand-Said (Hg.): Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, Article No.: 17: WIMS '17: Amantea, Italy, 19.-22. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter https://doi.org/10.1145/3102254.3102272, zuletzt geprüft am 21.05.2021
Abstract: Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by introducing a framework that elaborates on linked data quality issues relevant to different stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of different kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns.
-
(2016) : A Regional News Corpora for Contextualized Entity Discovery and Linking In: Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Goggi, Sara; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios (Hg.): Tenth International Conference on Language Resources and Evaluation: Conference Proceedings: LREC '16: Portorož, Slovenia, Mai: Paris: European Language Resources Association (ELRA), S. 3333-3338. Online verfügbar unter https://www.aclweb.org/anthology/L16-1531, zuletzt geprüft am 21.05.2021