Menu
Alle Publikationen
Übersicht

Übersicht

Geben Sie einen Suchbegriff ein oder verwenden Sie die Erweiterte Suche um nach Autor, Erscheinungsjahr oder Dokumenttyp zu filtern.

  • Erweiterte Suche öffnen

  • Kaplan, Himmet; Weichselbraun, Albert; Braşoveanu, Adrian M.P. (2023): Integrating Economic Theory, Domain Knowledge, and Social Knowledge into Hybrid Sentiment Models for Predicting Crude Oil Markets. In: Cognitive Computation, zuletzt geprüft am 31.03.2023

    Abstract: For several decades, sentiment analysis has been considered a key indicator for assessing market mood and predicting future price changes. Accurately predicting commodity markets requires an understanding of fundamental market dynamics such as the interplay between supply and demand, which are not considered in standard affective models. This paper introduces two domain-specific affective models, CrudeBERT and CrudeBERT+, that adapt sentiment analysis to the crude oil market by incorporating economic theory with common knowledge of the mentioned entities and social knowledge extracted from Google Trends. To evaluate the predictive capabilities of these models, comprehensive experiments were conducted using dynamic time warping to identify the model that best approximates WTI crude oil futures price movements. The evaluation included news headlines and crude oil prices between January 2012 and April 2021. The results show that CrudeBERT+ outperformed RavenPack, BERT, FinBERT, and early CrudeBERT models during the 9-year evaluation period and within most of the individual years that were analyzed. The success of the introduced domain-specific affective models demonstrates the potential of integrating economic theory with sentiment analysis and external knowledge sources to improve the predictive power of financial sentiment analysis models. The experiments also confirm that CrudeBERT+ has the potential to provide valuable insights for decision-making in the crude oil market.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Rölke, Heiko; Weichselbraun, Albert (2023) : Ontologien und Linked Open Data In: Kuhlen, Rainer; Lewandowski, Dirk; Semar, Wolfgang; Womser-Hacker, Christa (Hg.): Grundlagen der Informationswissenschaft: 7., völlig neu gefasste Ausgabe: Berlin: De Gruyter, S. 257-269. Online verfügbar unter https://doi.org/10.1515/9783110769043-022, zuletzt geprüft am 16.12.2022

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Beier, Michael; Hauser, Christian; Weichselbraun, Albert (2022): Compliance-Untersuchungen im Zeitalter von Big Data und künstlicher Intelligenz. In: Compliance-Berater 10. Online verfügbar unter https://www.researchgate.net/publication/361276309_Compliance-Untersuchungen_im_Zeitalter_von_Big_Data_und_kunstlicher_Intelligenz, zuletzt geprüft am 23.06.2022

     

    Abstract: Seit mehr als zwei Jahrzehnten werden IT-gestützte Instrumente bei Compliance-Untersuchungen eingesetzt. Dabei haben sich der Anwendungsbereich und die Methoden im Laufe der Zeit erheblich verändert. Einerseits nimmt die Menge der zu bearbeitenden Dokumente, Daten und Datentypen massiv zu. Andererseits werden die technischen Methoden zur Datenbearbeitung immer leistungsstärker. Aktuell stellt sich die Frage, inwieweit es möglich ist, durch neue Technologien aus dem Bereich Big Data und künstlicher Intelligenz (KI) Automatisierungspotenziale zu heben, mit denen Compliance-Untersuchungen besser, schneller und kostengünstiger durchgeführt werden können. Dieser Beitrag zeigt den aktuellen Stand in der Praxis sowie Entwicklungspotenziale in der nahen Zukunft auf.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Jehan, Eleanor; Weichselbraun, Albert (2022): Internal Integrity Risk Warning System. Integrity Fund Meeting. Koenig & Bauer Banknote Solutions. Lausanne, 1. Juli, 2022

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Jehan, Eleanor; Weichselbraun, Albert; Beier, Michael (2022): Whistleblower investigations in the age of Big Data and artificial intelligence. Working Group Meeting. ECS Working Group Whistleblowing. Zürich, 20. Juni, 2022

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Waldvogel, Roger; Fraefel, Andreas; van Schie, Alexander; Süsstrunk, Norman; Kuntschik, Philipp (2022): Slot Filling for Extracting Reskilling and Upskilling Options from the Web. 27th International Conference on Natural Language & Information Systems (NLDB). Universitat Politècnica de València. Valencia,17. Juni, 2022. Online verfügbar unter https://www.youtube.com/watch?v=rIhhKjJAMnY&t=2608s, zuletzt geprüft am 24.11.2022

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Waldvogel, Roger; Fraefel, Andreas; van Schie, Alexander; Kuntschik, Philipp (2022): Building Knowledge Graphs and Recommender Systems for Suggesting Reskilling and Upskilling Options from the Web. In: Information 13. Online verfügbar unter https://doi.org/10.3390/info13110510, zuletzt geprüft am 24.11.2022

     

    Abstract: As advances in science and technology, crisis, and increased competition impact labor markets, reskilling and upskilling programs emerged to mitigate their effects. Since information on continuing education is highly distributed across websites, choosing career paths and suitable upskilling options is currently considered a challenging and cumbersome task. This article, therefore, introduces a method for building a comprehensive knowledge graph from the education providers’ Web pages. We collect educational programs from 488 providers and leverage entity recognition and entity linking methods in conjunction with contextualization to extract knowledge on entities such as prerequisites, skills, learning objectives, and course content. Slot filling then integrates these entities into an extensive knowledge graph that contains close to 74,000 nodes and over 734,000 edges. A recommender system leverages the created graph, and background knowledge on occupations to provide a career path and upskilling suggestions. Finally, we evaluate the knowledge extraction approach on the CareerCoach 2022 gold standard and draw upon domain experts for judging the career paths and upskilling suggestions provided by the recommender system.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; van Schie, Alexander; Fraefel, Andreas; Kuntschik, Philipp; Waldvogel, Roger (2022) : Career Coach. Automatische Wissensextraktion und Expertensystem für personalisierte Re- und Upskilling Vorschläge In: Forster, Michael; Alt, Sharon; Hanselmann, Marcel; Deflorin, Patricia (Hg.): Digitale Transformation an der Fachhochschule Graubünden: Case Studies aus Forschung und Lehre: Chur: FH Graubünden Verlag, S. 11-18

    Abstract: CareerCoach entwickelt Methoden zur automatischen Extraktion von Fortbildungsangeboten. Das System analysiert die Webseiten von Bildungsanbietenden und integriert deren Angebote in einen zentralen Wissensgrafen, der innovative Dienstleistungen wie semantische Suchen und Expertensysteme unterstützt.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Havelka, Anina; Hörler, Sandro; Weichselbraun, Albert (2021) : Towards Developing an Integrity Risk Monitor (IRM). A Status Report In: Makowicz, Bartosz: Global Ethics, Compliance & Integrity: Yearbook 2021: Bern: Peter Lang, S. 123-131

    Abstract: Risks, which could jeopardize the integrity of a company, are widespread. This holds true for firms located in Switzerland too. According to a recent study by PricewaterhouseCoopers (2018), almost 40 percent of Swiss companies have been affected by illegal and unethical behavior, such as embezzlement, cybercrime, intellectual property infringements, corruption, fraud, money laundering, and anti-competitive agreements. Although the number of cases in Switzerland is relatively low when compared to other countries globally, the financial damage for affected Swiss companies caused by these incidents is nevertheless above the global average.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Weichselbraun, Albert; Havelka, Anina; Hörler, Sandro; Waldvogel, Roger (2021): Integrity Risk Monitor. Chur: FH Graubünden Verlag. Online verfügbar unter https://www.fhgr.ch/fhgr/unternehmerisches-handeln/schweizerisches-institut-fuer-entrepreneurship-sife/projekte/integrity-risk-monitor-irm/, zuletzt geprüft am 17.03.2022

     

    Abstract: Integre Unternehmensführung hat in den vergangenen Jahren national und international an Bedeutung gewonnen. So thematisiert die Wirtschaftspresse immer wieder das Verhalten von Unternehmen, die ihrer unternehmerischen Verantwortung nicht gerecht werden. Zugleich verlangen verschiedene Anspruchsgruppen von den Unternehmen mehr Transparenz bzgl. ihrer Aktivitäten. Dies veranlasst die Unternehmen in ihrer nicht-finanziellen Geschäftsberichterstattung über ihre Bemühungen um integres Geschäftsgebaren in den Bereichen Menschenrechte, Umwelt und Anti-Korruption zu berichten. Im Rahmen des Forschungsprojekts Integry Risk Monitor (IRM) wurden das IRM-Portal und das IRM-Dashboard entwickelt. Hierbei handelt es sich um webbasierte Echtzeit-Monitoring-Instrumente. Das IRM-Portal umfasst Medienbeiträge der letzten 25 Jahre aus unterschiedlichen Quellen. Ferner durchforstet der Algorithmus permanent das World Wide Web und sammelt neue Beiträge aus redaktionellen Medien. Diese können mithilfe des IRM-Dashboards mit verschiedenen Analyse- und Darstellungsmöglichkeiten untersucht und Zusammenhänge, Beteiligte, Sentiments und geografische Hauptregionen ermittelt werden. Zudem wurde im Rahmen des Projektes auch die nicht-finanzielle Geschäftsberichterstattung von Unternehmen untersucht, um Beziehungen zwischen der medialen und nicht-finanziellen Berichterstattung zu analysieren. Die Ergebnisse der Untersuchung machen deutlich, dass sowohl die Medien als auch die analysierten Unternehmen in den letzten 25 Jahren mehr über die Themen Menschenrechte, Umwelt und Korruption berichten, vorderhand jedoch kein direkter linearer Zusammenhang zwischen diesen beiden Formen der Berichterstattung besteht.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Weichselbraun, Albert; Jehan, Eleanor; Schmid, Marco (2021): Internal integrity risk warning system (IIRWiS). Integrity Fund Meeting. Koenig & Bauer Banknote Solutions. Online, 29. März, 2021

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Steixner, Jakob; Braşoveanu, Adrian M.P.; Scharl, Arno; Göbel, Max; Nixon, Lyndon J.B. (2021): Automatic Expansion of Domain-Specific Affective Models for Web Intelligence Applications. In: Cognitive Computation. Online verfügbar unter https://doi.org/10.1007/s12559-021-09839-4, zuletzt geprüft am 18.02.2021

     

    Abstract: Sentic computing relies on well-defined affective models of different complexity—polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation’s strategic positioning goals. Such goals often deviate from the assumptions of standardised affective models. While certain emotions such as Joy and Trust typically represent desirable brand associations, specific communication goals formulated by marketing professionals often go beyond such standard dimensions. For instance, the brand manager of a television show may consider fear or sadness to be desired emotions for its audience. This article introduces expansion techniques for affective models, combining common and commonsense knowledge available in knowledge graphs with language models and affective reasoning, improving coverage and consistency as well as supporting domain-specific interpretations of emotions. An extensive evaluation compares the performance of different expansion techniques: (i) a quantitative evaluation based on the revisited Hourglass of Emotions model to assess performance on complex models that cover multiple affective categories, using manually compiled gold standard data, and (ii) a qualitative evaluation of a domain-specific affective model for television programme brands. The results of these evaluations demonstrate that the introduced techniques support a variety of embeddings and pre-trained models. The paper concludes with a discussion on applying this approach to other scenarios where affective model resources are scarce.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Francolino, Vincenzo; Saner, Mirco; Dahinden, Urs; Wyss, Vinzenz (2021): Adapting Data-Driven Research to the Fields of Social Sciences and the Humanities. In: Future Internet 13. Online verfügbar unter doi.org/10.3390/fi13030059, zuletzt geprüft am 18.05.2021

     

    Abstract: Recent developments in the fields of computer science, such as advances in the areas of big data, knowledge extraction, and deep learning, have triggered the application of data-driven research methods to disciplines such as the social sciences and humanities. This article presents a collaborative, interdisciplinary process for adapting data-driven research to research questions within other disciplines, which considers the methodological background required to obtain a significant impact on the target discipline and guides the systematic collection and formalization of domain knowledge, as well as the selection of appropriate data sources and methods for analyzing, visualizing, and interpreting the results. Finally, we present a case study that applies the described process to the domain of communication science by creating approaches that aid domain experts in locating, tracking, analyzing, and, finally, better understanding the dynamics of media criticism. The study clearly demonstrates the potential of the presented method, but also shows that data-driven research approaches require a tighter integration with the methodological framework of the target discipline to really provide a significant impact on the target discipline.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2021): Inscriptis: A Python-based HTML to text conversion library optimized for knowledge extraction from the Web. In: Journal of Open Source Software 6. Online verfügbar unter https://doi.org/10.21105/joss.03557, zuletzt geprüft am 22.10.2021

     

    Abstract: Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium (Huggins et al., 2021). In contrast to existing software packages such as HTML2text (Swartz, 2021), jusText (Belica, 2021) and Lynx (Dickey, 2021), Inscriptis 1. provides a layout-aware conversion of HTML that more closely resembles the rendering obtained from standard Web browsers and, therefore, better preserves the spatial arrangement of text elements. Inscriptis excels in terms of conversion quality, since it correctly converts complex HTML constructs such as nested tables and also interprets a subset of HTML (e.g., align, valign) and CSS (e.g., display, white-space, margin-top, vertical-align, etc.) attributes that determine the text alignment. 2. supports annotation rules, i.e., user-provided mappings that allow for annotating the extracted text based on structural and semantic information encoded in HTML tags and attributes used for controlling structure and layout in the original HTML document. These unique features ensure that downstream knowledge extraction components can operate on accurate text representations, and may even use information on the semantics and structure of the original HTML document, if annotation support has been enabled.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Weichselbraun, Albert; Nixon, Lyndon J.B. (2020) : In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works In: Fernández, Raquel; Linzen, Tal (Hg.): Proceedings of the 24th Conference on Computational Natural Language Learning: CoNLL 2020: Online, 19.-20. November: Stroudsburg, PA, USA: Association for Computational Linguistics, S. 355-364. Online verfügbar unter doi.org/10.18653/v1/2020.conll-1.28, zuletzt geprüft am 21.05.2021

     

    Abstract: Annotation styles express guidelines that direct human annotators in what rules to follow when creating gold standard annotations of text corpora. These guidelines not only shape the gold standards they help create, but also influence the training and evaluation of Named Entity Linking (NEL) tools, since different annotation styles correspond to divergent views on the entities present in the same texts. Such divergence is particularly present in texts from the media domain that contain references to creative works. In this work we present a corpus of 1000 annotated documents selected from the media domain. Each document is presented with multiple gold standard annotations representing various annotation styles. This corpus is used to evaluate a series of Named Entity Linking tools in order to understand the impact of the differences in annotation styles on the reported accuracy when processing highly ambiguous entities such as names of creative works. Relaxed annotation guidelines that include overlap styles lead to better results across all tools.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Hörler, Sandro; Weichselbraun, Albert (2020): Development and publication of the Integrity Risk Monitor (IRM). Integrity Fund. Meeting of the project managers. Koenig & Bauer Banknote Solutions. Lausanne, 22. Januar, 2020

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Weichselbraun, Albert (2020): Applications of Deep Learning in Integrity Management. Integrity Fund. Board Meeting. Koenig & Bauer Banknote Solutions. Online, 14. Dezember, 2020

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Hörler, Sandro (2020): Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration. In: Information. Wissenschaft & Praxis 71, S. 321-325. Online verfügbar unter https://doi.org/10.1515/iwp-2020-2119, zuletzt geprüft am 30.10.2020

     

    Abstract: Unternehmensbewertungen in der Biotech-Branche, Pharmazie und Medizintechnik stellen eine anspruchsvolle Aufgabe dar, insbesondere bei Berücksichtigung der einzigartigen Risiken, denen Biotech-Startups beim Eintritt in neue Märkte ausgesetzt sind. Unternehmen, die auf globale Bewertungsdienstleistungen spezialisiert sind, kombinieren daher Bewertungsmodelle und Erfahrungen aus der Vergangenheit mit heterogenen Metriken und Indikatoren, die Einblicke in die Leistung eines Unternehmens geben. Dieser Beitrag veranschaulicht, wie automatisierte Wissensidentifikation, -extraktion und -integration genutzt werden können, um (i) zusätzliche Indikatoren zu ermitteln, die Einblicke in den Erfolg eines Unternehmens in der Produktentwicklung geben und um (ii) arbeitsintensive Datensammelprozesse zur Unternehmensbewertung zu unterstützen.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Hörler, Sandro (2020): Improving Company Valuations with Automated Knowledge Discovery, Extraction and Fusion. English translation of the article: "Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration". Information - Wissenschaft und Praxis 71 (5-6):321-325. Online verfügbar unter https://arxiv.org/abs/2010.09249, zuletzt geprüft am 18.05.2021

     

    Abstract: Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into a company's performance. This paper illustrates how automated knowledge discovery, extraction and data fusion can be used to (i) obtain additional indicators that provide insights into the success of a company's product development efforts, and (ii) support labor-intensive data curation processes. We apply deep web knowledge acquisition methods to identify and harvest data on clinical trials that is hidden behind proprietary search interfaces and integrate the extracted data into the industry partner's company valuation ontology. In addition, focused Web crawls and shallow semantic parsing yield information on the company's key personnel and respective contact data, notifying domain experts of relevant changes that get then incorporated into the industry partner's company data.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Hörler, Sandro; Hauser, Christian; Havelka, Anina (2020) : Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence In: Chbeir, Richard; Manolopoulos, Yannis; Akerkar, Rajendra; Mizera-Pietraszko, Jolanta (Hg.): Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics: WIMS 2020: Biarritz, France, 30. Juni - 3. Juli: New York, NY, USA: Association for Computing Machinery (ACM), S. 54-62. Online verfügbar unter doi.org/10.1145/3405962.3405988, zuletzt geprüft am 21.05.2021

     

    Abstract: A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Braşoveanu, Adrian M.P.; Waldvogel, Roger; Odoni, Fabian (2020) : Harvest: An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums: The 20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology: A Hybrid Conference with both Online and Offline Modes: Melbourne, Australia, 14.-17. Dezember

    Abstract: Web forums discuss topics of long-term, persisting involvements in domains such as health, mobile software development and online gaming, some of which are of high interest from a research and business perspective. In the medical domain, for example, forums contain information on symptoms, drug side effects and patient discussions that are highly relevant for patient-focused healthcare and drug development. Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more general case of content extraction from web pages and lack key features important for understanding forum content such as the identification of author metadata and information on the thread structure. This paper, therefore, presents a method that determines the XPath of forum posts, eliminating incorrect mergers and splits of the extracted posts that were common in systems from the previous generation. Based on the individual posts further metadata such as authors, forum URL and structure are extracted. We evaluate our approach by creating a gold standard which contains 102 forum pages from 52 different Web forums, and benchmarking against a baseline and competing tools.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Hauser, Christian; Hörler, Sandro; Havelka, Anina (2020): Deep learning and visual tools for analyzing and monitoring integrity risks. 5th SwissText & 16th KONVENS Joint Conference. Online, 23.-25. Juni, 2020. Online verfügbar unter https://youtu.be/S9Oxw_UlaW0, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Odoni, Fabian; Braşoveanu, Adrian M.P.; Kuntschik, Philipp; Weichselbraun, Albert (2019) : Introducing orbis. An extendable evaluation pipeline for named entity linking performance drill‐down analyses In: Blake, Catherine; Brown, Cecelia (Hg.): 82nd Annual Meeting of The Association for Information Science: Proceedings, 56: ASIS&T 2019: Melbourne, Australia, 19.-23. Oktober: Somerset, NJ, USA: John Wiley & Sons, Ltd, S. 468-471. Online verfügbar unter doi.org/10.1002/pra2.49, zuletzt geprüft am 21.05.2021

     

    Abstract: Most current evaluation tools are focused solely on benchmarking and comparative evaluations thus only provide aggregated statistics such as precision, recall and F1-measure to assess overall system performance. They do not offer comprehensive analyses up to the level of individual annotations. This paper introduces Orbis, an extendable evaluation pipeline framework developed to allow visual drill-down analyses of individual entities, computed by annotation services, in the context of the text they appear in, in reference to the entities specified in the gold standard.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Rinaldi, Fabio; Kuntschik, Philipp; Gottowik, Jürgen; Leddin, Mathias; Esteban, Raul R.; Weichselbraun, Albert; Ellendorff, Tilia; Colic, Nico; Furrer, Lenz (2019): MedMon: social media analytics for an healthcare application. 4th SwissText Analytics Conference. Winterthur, 18.-19. Juni, 2019. Online verfügbar unter https://youtu.be/SA61WJ57XAc, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2019): Datenakquiseprozesse mittels Big Data optimieren (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2019.pdf, zuletzt geprüft am 09.04.2021

     

    Abstract: Im Rahmen des DISCOVER-Projekts werden Methoden für die automatische Datenakquise, die Extraktion und Integration von entscheidungsrelevanter Information aus heterogenen Onlinequellen entwickelt, welche auch in der Lage sind, Inhalte aus dem Deep Web zu analysieren.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Braşoveanu, Adrian M.P. (2019) : Name Variants for Improving Entity Discovery and Linking In: Eskevich, Maria; Melo, Gerard de; Fäth, Christian; McCrae, John P.; Buitelaar, Paul; Chiarcos, Christian; Klimek, Bettina; Dojchinovski, Milan (Hg.): 2nd Conference onLanguage, Data and Knowledge: LDK 2019: Leipzig, 20.-23. Mai: Saarbrücken/Wadern: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing (OASIcs), S. 14:1-14:15. Online verfügbar unter https://doi.org/10.4230/OASIcs.LDK.2019.14, zuletzt geprüft am 21.05.2021

     

    Abstract: Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content. This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Braşoveanu, Adrian M.P.; Kuntschik, Philipp; Nixon, Lyndon J.B. (2019) : Improving Named Entity Linking Corpora Quality In: Angelova, Galia; Mitkov, Ruslan; Nikolova, Ivelina; Temnikova, Irina (Hg.): Natural Language Processing in a Deep Learning World: Proceedings: International Conference Recent Advances in Natural Language Processing (RANLP 2019): Varna, Bulgaria, 2.-4. September: Bulgaria: Ltd., Shoumen, S. 1328-1337. Online verfügbar unter https://doi.org/10.26615/978-954-452-056-4_152, zuletzt geprüft am 21.05.2021

     

    Abstract: Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2019): Capturing, analyzing and visualizing user generated content from social media. 27th Conference on Intelligent Systems for Molecular Biology (ISMB); 17th European Conference on Computational Biology (ECCB); Special session on Social media mining for drug discovery research: challenges and opportunities of Real World Text. Basel, 21.-25. Juni, 2019

     

    Abstract: Source format variability and noise are major challenges when harvesting content from social media. This presentation discusses methods and abstractions for gathering user generated content from Web pages and social media platforms covering (i) structured content, (ii) platforms that leverage Semantic Web standard such as Microformats, RDFa and JSON-LD, and (iii) semi-structured or even unstructured content that is typically found in Web forums. We then discuss pre-processing and anonymization tasks and outline how the collected content is annotated, aggregated and summarized in a so called contextualized information space. An interactive dashboard provides efficient means for analyzing, browsing and visualizing this information space. The dashboard supports analysts in identifying emerging trends and topics, exploring the lexical, geospatial and relational context of topics and entities such as health conditions, diseases, symptoms and drugs, and performing drill-down analysis to shed light on individual posts and statements that cause the observed effects.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Nixon, Lyndon J.B.; Weichselbraun, Albert (2018) : StoryLens: A Multiple Views Corpus for Location and Event Detection In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 30: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter doi.org/10.1145/3227609.3227674, zuletzt geprüft am 21.05.2021

     

    Abstract: The news media landscape tends to focus on long-running narratives. Correctly processing new information, therefore, requires considering multiple lenses when analyzing media content. Traditionally it would have been considered sufficient to extract the topics or entities contained in a text in order to classify it, but today it is important to also look at more sophisticated annotations related to fine-grained geolocation, events, stories and the relations between them. In order to leverage such lenses we propose a new corpus that offers a diverse set of annotations over texts collected from multiple media sources. We also showcase the framework used for creating the corpus, as well as how the information from the various lenses can be used in order to support different use cases in the EU project InVID for verifying the veracity of online video.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Rizzo, Giuseppe; Kuntschik, Philipp; Weichselbraun, Albert; Nixon, Lyndon J.B. (2018) : Framing Named Entity Linking Error Types In: Calzolari, Nicoletta; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Hasida, Koiti; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios; Tokunaga, Takenobu (Hg.): Eleventh International Conference on Language Resources and Evaluation: Conference Proceedings. Unter Mitarbeit von Sara Goggi und Hélène Mazo: LREC '18: Miyazaki, Japan, 7.-12. Mai: Paris: European Language Resources Association (ELRA), S. 266-271. Online verfügbar unter https://www.aclweb.org/anthology/L18-1040/, zuletzt geprüft am 21.05.2021

     

    Abstract: Named Entity Linking (NEL) and relation extraction forms the backbone of Knowledge Base Population tasks. The recent rise of large open source Knowledge Bases and the continuous focus on improving NEL performance has led to the creation of automated benchmark solutions during the last decade. The benchmarking of NEL systems offers a valuable approach to understand a NEL system’s performance quantitatively. However, an in-depth qualitative analysis that helps improving NEL methods by identifying error causes usually requires a more thorough error analysis. This paper proposes a taxonomy to frame common errors and applies this taxonomy in a survey study to assess the performance of four well-known Named Entity Linking systems on three recent gold standards.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Odoni, Fabian; Kuntschik, Philipp; Braşoveanu, Adrian M.P.; Weichselbraun, Albert (2018): On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance. SEMANTiCS 2018: 14th International Conference on Semantic Systems. In: Procedia Computer Science 137, S. 33-42. Online verfügbar unter https://doi.org/10.1016/j.procs.2018.09.004, zuletzt geprüft am 21.05.2021

     

    Abstract: Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations. This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools. We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i) improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2018): Optimierung von Karriere- und Recruitingprozessen mittels Web Analytics und künstlicher Intelligenz (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2018.pdf, zuletzt geprüft am 09.04.2021

     

    Abstract: Maschinelle Verfahren können die gezielte Suche nach qualifizierten Kandidatinnen und Kandidaten, die Analyse von Karriereverläufen sowie Karriereplanungs- und Weiterbildungsprozesse unterstützen.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Braşoveanu, Adrian M.P. (2018) : Mining and Leveraging Background Knowledge for Improving Named Entity Linking In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 27: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter doi.org/10.1145/3227609.3227670, zuletzt geprüft am 21.05.2021

     

    Abstract: Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2018): On the convergence of Artificial Intelligence and Big Data. Potential, challenges and impact. Keynote. Graubünden forscht. Academia Raetica. Davos, 19. September, 2018

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Süsstrunk, Norman; Odoni, Fabian; Braşoveanu, Adrian M.P. (2018): Optimizing Information Acquisition and Decision Making Processes with Natural Language Processing, Machine Learning and Visual Analytics. 3rd SwissText Analytics Conference. Winterthur, 12.-13. Juni, 2018. Online verfügbar unter https://youtu.be/YicWN1rEn7M, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Marx, Edgard; Shekarpour, Saeedeh; Soru, Tommaso; Braşoveanu, Adrian M.P.; Saleem, Muhammad; Baron, Ciro; Weichselbraun, Albert; Lehmann, Jens; Ngomo, Axel-Cyrille Ngonga; Auer, Soren (2017) : Torpedo: Improving the State-of-the-Art RDF Dataset Slicing: 11th International Conference on Semantic Computing: ICSC: San Diego, CA, USA, 30. Januar - 1. Februar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 149-156. Online verfügbar unter https://doi.org/10.1109/ICSC.2017.79, zuletzt geprüft am 21.05.2021

     

    Abstract: Over the last years, the amount of data published as Linked Data on the Web has grown enormously. In spite of the high availability of Linked Data, organizations still encounter an accessibility challenge while consuming it. This is mostly due to the large size of some of the datasets published as Linked Data. The core observation behind this work is that a subset of these datasets suffices to address the needs of most organizations. In this paper, we introduce Torpedo, an approach for efficiently selecting and extracting relevant subsets from RDF datasets. In particular, Torpedo adds optimization techniques to reduce seek operations costs as well as the support of multi-join graph patterns and SPARQL FILTERs that enable to perform a more granular data selection. We compare the performance of our approach with existing solutions on nine different queries against four datasets. Our results show that our approach is highly scalable and is up to 26% faster than the current state-of-the-art RDF dataset slicing approach.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Herring, David; Rafelsberger, Walter; Hubmann-Haidvogel, Alexander; Kamolov, Ruslan; Fischl, Daniel; Fols, Michael; Weichselbraun, Albert (2017): Semantic Systems and Visual Tools to Support Environmental Communication. In: IEEE Systems Journal 11, S. 762-771. Online verfügbar unter https://doi.org/10.1109/JSYST.2015.2466439, zuletzt geprüft am 24.07.2020

     

    Abstract: Given the intense attention that environmental topics such as climate change attract in news and social media coverage, scientists and communication professionals want to know how different stakeholders perceive observable threats and policy options, how specific media channels react to new insights, and how journalists present scientific knowledge to the public. This paper investigates the potential of semantic technologies to address these questions. After summarizing methods to extract and disambiguate context information, we present visualization techniques to explore the lexical, geospatial, and relational context of topics and entities referenced in these repositories. The examples stem from the Media Watch on Climate Change, the Climate Resilience Toolkit and the NOAA Media Watch-three applications that aggregate environmental resources from a wide range of online sources. These systems not only show the value of providing comprehensive information to the public, but also have helped to develop a novel communication success metric that goes beyond bipolar assessments of sentiment.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Gindl, Stefan; Fischer, Fabian; Vakulenko, Svitlana; Scharl, Arno (2017): Aspect-Based Extraction and Analysis of Affective Knowledge from Social Media Streams. In: IEEE Intelligent Systems 32, S. 80-88. Online verfügbar unter doi.org/10.1109/MIS.2017.57, zuletzt geprüft am 18.05.2021

     

    Abstract: Extracting and analyzing affective knowledge from social media in a structured manner is a challenging task. Decision makers require insights into the public perception of a company's products and services, as a strategic feedback channel to guide communication campaigns, and as an early warning system to quickly react in the case of unforeseen events. The approach presented in this article goes beyond bipolar metrics of sentiment. It combines factual and affective knowledge extracted from rich public knowledge bases to analyze emotions expressed toward specific entities (targets) in social media. The authors obtain common and common-sense domain knowledge from DBpedia and ConceptNet to identify potential sentiment targets. They employ affective knowledge about emotional categories available from SenticNet to assess how those targets and their aspects (such as specific product features) are perceived in social media. An evaluation shows the usefulness and correctness of the extracted domain knowledge, which is used in a proof-of-concept data analytics application to investigate the perception of car brands on social media in the period between September and November 2015.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp (2017) : Mitigating linked data quality issues in knowledge-intense information extraction methods In: Akerkar, Rajendra; Cuzzocrea, Alfredo; Cao, Jannong; Hacid, Mohand-Said (Hg.): Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, Article No.: 17: WIMS '17: Amantea, Italy, 19.-22. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter https://doi.org/10.1145/3102254.3102272, zuletzt geprüft am 21.05.2021

     

    Abstract: Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by introducing a framework that elaborates on linked data quality issues relevant to different stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of different kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Nixon, Lyndon J.B.; Weichselbraun, Albert; Scharl, Arno (2016) : A Regional News Corpora for Contextualized Entity Discovery and Linking In: Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Goggi, Sara; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios (Hg.): Tenth International Conference on Language Resources and Evaluation: Conference Proceedings: LREC '16: Portorož, Slovenia, Mai: Paris: European Language Resources Association (ELRA), S. 3333-3338. Online verfügbar unter https://www.aclweb.org/anthology/L16-1531, zuletzt geprüft am 21.05.2021

     

    Abstract: This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Hubmann-Haidvogel, Alexander; Jones, Alistair; Fischl, Daniel; Kamolov, Ruslan; Weichselbraun, Albert; Rafelsberger, Walter (2016): Analyzing the public discourse on works of fiction. Detection and visualization of emotion in online coverage about HBO's Game of Thrones. In: Information processing & management 52, S. 129-138. Online verfügbar unter doi.org/10.1016/j.ipm.2015.02.003, zuletzt geprüft am 18.05.2021

     

    Abstract: This paper presents a Web intelligence portal that captures and aggregates news and social media coverage about "Game of Thrones", an American drama television series created for the HBO television network based on George R.R. Martin's series of fantasy novels. The system collects content from the Web sites of Anglo-American news media as well as from four social media platforms: Twitter, Facebook, Google+ and YouTube. An interactive dashboard with trend charts and synchronized visual analytics components not only shows how often Game of Thrones events and characters are being mentioned by journalists and viewers, but also provides a real-time account of concepts that are being associated with the unfolding storyline and each new episode. Positive or negative sentiment is computed automatically, which sheds light on the perception of actors and new plot elements.

    Export-Dateien: Citavi Endnote RIS