Publications UAS Grisons

Overview

Enter a search term or use the advanced search function to filter your search results according to the author, year of publication or document type.

Publications

Open advanced search

Kaplan, Himmet; Weichselbraun, Albert; Braşoveanu, Adrian M.P. (2023): Integrating Economic Theory, Domain Knowledge, and Social Knowledge into Hybrid Sentiment Models for Predicting Crude Oil Markets. In: Cognitive Computation, last checked on 31.03.2023

Abstract: For several decades, sentiment analysis has been considered a key indicator for assessing market mood and predicting future price changes. Accurately predicting commodity markets requires an understanding of fundamental market dynamics such as the interplay between supply and demand, which are not considered in standard affective models. This paper introduces two domain-specific affective models, CrudeBERT and CrudeBERT+, that adapt sentiment analysis to the crude oil market by incorporating economic theory with common knowledge of the mentioned entities and social knowledge extracted from Google Trends. To evaluate the predictive capabilities of these models, comprehensive experiments were conducted using dynamic time warping to identify the model that best approximates WTI crude oil futures price movements. The evaluation included news headlines and crude oil prices between January 2012 and April 2021. The results show that CrudeBERT+ outperformed RavenPack, BERT, FinBERT, and early CrudeBERT models during the 9-year evaluation period and within most of the individual years that were analyzed. The success of the introduced domain-specific affective models demonstrates the potential of integrating economic theory with sentiment analysis and external knowledge sources to improve the predictive power of financial sentiment analysis models. The experiments also confirm that CrudeBERT+ has the potential to provide valuable insights for decision-making in the crude oil market.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Rölke, Heiko; Weichselbraun, Albert (2023) : Ontologien und Linked Open Data In: Kuhlen, Rainer; Lewandowski, Dirk; Semar, Wolfgang; Womser-Hacker, Christa (Hg.): Grundlagen der Informationswissenschaft: 7., völlig neu gefasste Ausgabe: Berlin: De Gruyter, S. 257-269. Available online at https://doi.org/10.1515/9783110769043-022, last checked on 16.12.2022

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Beier, Michael; Hauser, Christian; Weichselbraun, Albert (2022): Compliance-Untersuchungen im Zeitalter von Big Data und künstlicher Intelligenz. In: Compliance-Berater 10. Available online at https://www.researchgate.net/publication/361276309_Compliance-Untersuchungen_im_Zeitalter_von_Big_Data_und_kunstlicher_Intelligenz, last checked on 23.06.2022

Abstract: Seit mehr als zwei Jahrzehnten werden IT-gestützte Instrumente bei Compliance-Untersuchungen eingesetzt. Dabei haben sich der Anwendungsbereich und die Methoden im Laufe der Zeit erheblich verändert. Einerseits nimmt die Menge der zu bearbeitenden Dokumente, Daten und Datentypen massiv zu. Andererseits werden die technischen Methoden zur Datenbearbeitung immer leistungsstärker. Aktuell stellt sich die Frage, inwieweit es möglich ist, durch neue Technologien aus dem Bereich Big Data und künstlicher Intelligenz (KI) Automatisierungspotenziale zu heben, mit denen Compliance-Untersuchungen besser, schneller und kostengünstiger durchgeführt werden können. Dieser Beitrag zeigt den aktuellen Stand in der Praxis sowie Entwicklungspotenziale in der nahen Zukunft auf.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Hauser, Christian; Jehan, Eleanor; Weichselbraun, Albert (2022): Internal Integrity Risk Warning System. Integrity Fund Meeting. Koenig & Bauer Banknote Solutions. Lausanne, 1. Juli, 2022

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Hauser, Christian; Jehan, Eleanor; Weichselbraun, Albert; Beier, Michael (2022): Whistleblower investigations in the age of Big Data and artificial intelligence. Working Group Meeting. ECS Working Group Whistleblowing. Zürich, 20. Juni, 2022

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Waldvogel, Roger; Fraefel, Andreas; van Schie, Alexander; Süsstrunk, Norman; Kuntschik, Philipp (2022): Slot Filling for Extracting Reskilling and Upskilling Options from the Web. 27th International Conference on Natural Language & Information Systems (NLDB). Universitat Politècnica de València. Valencia,17. Juni, 2022. Available online at https://www.youtube.com/watch?v=rIhhKjJAMnY&t=2608s, last checked on 24.11.2022

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Waldvogel, Roger; Fraefel, Andreas; van Schie, Alexander; Kuntschik, Philipp (2022): Building Knowledge Graphs and Recommender Systems for Suggesting Reskilling and Upskilling Options from the Web. In: Information 13. Available online at https://doi.org/10.3390/info13110510, last checked on 24.11.2022

Abstract: As advances in science and technology, crisis, and increased competition impact labor markets, reskilling and upskilling programs emerged to mitigate their effects. Since information on continuing education is highly distributed across websites, choosing career paths and suitable upskilling options is currently considered a challenging and cumbersome task. This article, therefore, introduces a method for building a comprehensive knowledge graph from the education providers’ Web pages. We collect educational programs from 488 providers and leverage entity recognition and entity linking methods in conjunction with contextualization to extract knowledge on entities such as prerequisites, skills, learning objectives, and course content. Slot filling then integrates these entities into an extensive knowledge graph that contains close to 74,000 nodes and over 734,000 edges. A recommender system leverages the created graph, and background knowledge on occupations to provide a career path and upskilling suggestions. Finally, we evaluate the knowledge extraction approach on the CareerCoach 2022 gold standard and draw upon domain experts for judging the career paths and upskilling suggestions provided by the recommender system.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; van Schie, Alexander; Fraefel, Andreas; Kuntschik, Philipp; Waldvogel, Roger (2022) : Career Coach. Automatische Wissensextraktion und Expertensystem für personalisierte Re- und Upskilling Vorschläge In: Forster, Michael; Alt, Sharon; Hanselmann, Marcel; Deflorin, Patricia (Hg.): Digitale Transformation an der Fachhochschule Graubünden: Case Studies aus Forschung und Lehre: Chur: FH Graubünden Verlag, S. 11-18

Abstract: CareerCoach entwickelt Methoden zur automatischen Extraktion von Fortbildungsangeboten. Das System analysiert die Webseiten von Bildungsanbietenden und integriert deren Angebote in einen zentralen Wissensgrafen, der innovative Dienstleistungen wie semantische Suchen und Expertensysteme unterstützt.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Hauser, Christian; Havelka, Anina; Hörler, Sandro; Weichselbraun, Albert (2021) : Towards Developing an Integrity Risk Monitor (IRM). A Status Report In: Makowicz, Bartosz: Global Ethics, Compliance & Integrity: Yearbook 2021: Bern: Peter Lang, S. 123-131

Abstract: Risks, which could jeopardize the integrity of a company, are widespread. This holds true for firms located in Switzerland too. According to a recent study by PricewaterhouseCoopers (2018), almost 40 percent of Swiss companies have been affected by illegal and unethical behavior, such as embezzlement, cybercrime, intellectual property infringements, corruption, fraud, money laundering, and anti-competitive agreements. Although the number of cases in Switzerland is relatively low when compared to other countries globally, the financial damage for affected Swiss companies caused by these incidents is nevertheless above the global average.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Hauser, Christian; Weichselbraun, Albert; Havelka, Anina; Hörler, Sandro; Waldvogel, Roger (2021): Integrity Risk Monitor. Chur: FH Graubünden Verlag. Available online at https://www.fhgr.ch/fhgr/unternehmerisches-handeln/schweizerisches-institut-fuer-entrepreneurship-sife/projekte/integrity-risk-monitor-irm/, last checked on 17.03.2022

Abstract: Integre Unternehmensführung hat in den vergangenen Jahren national und international an Bedeutung gewonnen. So thematisiert die Wirtschaftspresse immer wieder das Verhalten von Unternehmen, die ihrer unternehmerischen Verantwortung nicht gerecht werden. Zugleich verlangen verschiedene Anspruchsgruppen von den Unternehmen mehr Transparenz bzgl. ihrer Aktivitäten. Dies veranlasst die Unternehmen in ihrer nicht-finanziellen Geschäftsberichterstattung über ihre Bemühungen um integres Geschäftsgebaren in den Bereichen Menschenrechte, Umwelt und Anti-Korruption zu berichten. Im Rahmen des Forschungsprojekts Integry Risk Monitor (IRM) wurden das IRM-Portal und das IRM-Dashboard entwickelt. Hierbei handelt es sich um webbasierte Echtzeit-Monitoring-Instrumente. Das IRM-Portal umfasst Medienbeiträge der letzten 25 Jahre aus unterschiedlichen Quellen. Ferner durchforstet der Algorithmus permanent das World Wide Web und sammelt neue Beiträge aus redaktionellen Medien. Diese können mithilfe des IRM-Dashboards mit verschiedenen Analyse- und Darstellungsmöglichkeiten untersucht und Zusammenhänge, Beteiligte, Sentiments und geografische Hauptregionen ermittelt werden. Zudem wurde im Rahmen des Projektes auch die nicht-finanzielle Geschäftsberichterstattung von Unternehmen untersucht, um Beziehungen zwischen der medialen und nicht-finanziellen Berichterstattung zu analysieren. Die Ergebnisse der Untersuchung machen deutlich, dass sowohl die Medien als auch die analysierten Unternehmen in den letzten 25 Jahren mehr über die Themen Menschenrechte, Umwelt und Korruption berichten, vorderhand jedoch kein direkter linearer Zusammenhang zwischen diesen beiden Formen der Berichterstattung besteht.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Hauser, Christian; Weichselbraun, Albert; Jehan, Eleanor; Schmid, Marco (2021): Internal integrity risk warning system (IIRWiS). Integrity Fund Meeting. Koenig & Bauer Banknote Solutions. Online, 29. März, 2021

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Steixner, Jakob; Braşoveanu, Adrian M.P.; Scharl, Arno; Göbel, Max; Nixon, Lyndon J.B. (2021): Automatic Expansion of Domain-Specific Affective Models for Web Intelligence Applications. In: Cognitive Computation. Available online at https://doi.org/10.1007/s12559-021-09839-4, last checked on 18.02.2021

Abstract: Sentic computing relies on well-defined affective models of different complexity—polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation’s strategic positioning goals. Such goals often deviate from the assumptions of standardised affective models. While certain emotions such as Joy and Trust typically represent desirable brand associations, specific communication goals formulated by marketing professionals often go beyond such standard dimensions. For instance, the brand manager of a television show may consider fear or sadness to be desired emotions for its audience. This article introduces expansion techniques for affective models, combining common and commonsense knowledge available in knowledge graphs with language models and affective reasoning, improving coverage and consistency as well as supporting domain-specific interpretations of emotions. An extensive evaluation compares the performance of different expansion techniques: (i) a quantitative evaluation based on the revisited Hourglass of Emotions model to assess performance on complex models that cover multiple affective categories, using manually compiled gold standard data, and (ii) a qualitative evaluation of a domain-specific affective model for television programme brands. The results of these evaluations demonstrate that the introduced techniques support a variety of embeddings and pre-trained models. The paper concludes with a discussion on applying this approach to other scenarios where affective model resources are scarce.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Kuntschik, Philipp; Francolino, Vincenzo; Saner, Mirco; Dahinden, Urs; Wyss, Vinzenz (2021): Adapting Data-Driven Research to the Fields of Social Sciences and the Humanities. In: Future Internet 13. Available online at doi.org/10.3390/fi13030059, last checked on 18.05.2021

Abstract: Recent developments in the fields of computer science, such as advances in the areas of big data, knowledge extraction, and deep learning, have triggered the application of data-driven research methods to disciplines such as the social sciences and humanities. This article presents a collaborative, interdisciplinary process for adapting data-driven research to research questions within other disciplines, which considers the methodological background required to obtain a significant impact on the target discipline and guides the systematic collection and formalization of domain knowledge, as well as the selection of appropriate data sources and methods for analyzing, visualizing, and interpreting the results. Finally, we present a case study that applies the described process to the domain of communication science by creating approaches that aid domain experts in locating, tracking, analyzing, and, finally, better understanding the dynamics of media criticism. The study clearly demonstrates the potential of the presented method, but also shows that data-driven research approaches require a tighter integration with the methodological framework of the target discipline to really provide a significant impact on the target discipline.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert (2021): Inscriptis: A Python-based HTML to text conversion library optimized for knowledge extraction from the Web. In: Journal of Open Source Software 6. Available online at https://doi.org/10.21105/joss.03557, last checked on 22.10.2021

Abstract: Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium (Huggins et al., 2021). In contrast to existing software packages such as HTML2text (Swartz, 2021), jusText (Belica, 2021) and Lynx (Dickey, 2021), Inscriptis 1. provides a layout-aware conversion of HTML that more closely resembles the rendering obtained from standard Web browsers and, therefore, better preserves the spatial arrangement of text elements. Inscriptis excels in terms of conversion quality, since it correctly converts complex HTML constructs such as nested tables and also interprets a subset of HTML (e.g., align, valign) and CSS (e.g., display, white-space, margin-top, vertical-align, etc.) attributes that determine the text alignment. 2. supports annotation rules, i.e., user-provided mappings that allow for annotating the extracted text based on structural and semantic information encoded in HTML tags and attributes used for controlling structure and layout in the original HTML document. These unique features ensure that downstream knowledge extraction components can operate on accurate text representations, and may even use information on the semantics and structure of the original HTML document, if annotation support has been enabled.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Braşoveanu, Adrian M.P.; Weichselbraun, Albert; Nixon, Lyndon J.B. (2020) : In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works In: Fernández, Raquel; Linzen, Tal (Hg.): Proceedings of the 24th Conference on Computational Natural Language Learning: CoNLL 2020: Online, 19.-20. November: Stroudsburg, PA, USA: Association for Computational Linguistics, S. 355-364. Available online at doi.org/10.18653/v1/2020.conll-1.28, last checked on 21.05.2021

Abstract: Annotation styles express guidelines that direct human annotators in what rules to follow when creating gold standard annotations of text corpora. These guidelines not only shape the gold standards they help create, but also influence the training and evaluation of Named Entity Linking (NEL) tools, since different annotation styles correspond to divergent views on the entities present in the same texts. Such divergence is particularly present in texts from the media domain that contain references to creative works. In this work we present a corpus of 1000 annotated documents selected from the media domain. Each document is presented with multiple gold standard annotations representing various annotation styles. This corpus is used to evaluate a series of Named Entity Linking tools in order to understand the impact of the differences in annotation styles on the reported accuracy when processing highly ambiguous entities such as names of creative works. Relaxed annotation guidelines that include overlap styles lead to better results across all tools.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Hauser, Christian; Hörler, Sandro; Weichselbraun, Albert (2020): Development and publication of the Integrity Risk Monitor (IRM). Integrity Fund. Meeting of the project managers. Koenig & Bauer Banknote Solutions. Lausanne, 22. Januar, 2020

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Hauser, Christian; Weichselbraun, Albert (2020): Applications of Deep Learning in Integrity Management. Integrity Fund. Board Meeting. Koenig & Bauer Banknote Solutions. Online, 14. Dezember, 2020

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Kuntschik, Philipp; Hörler, Sandro (2020): Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration. In: Information. Wissenschaft & Praxis 71, S. 321-325. Available online at https://doi.org/10.1515/iwp-2020-2119, last checked on 30.10.2020

Abstract: Unternehmensbewertungen in der Biotech-Branche, Pharmazie und Medizintechnik stellen eine anspruchsvolle Aufgabe dar, insbesondere bei Berücksichtigung der einzigartigen Risiken, denen Biotech-Startups beim Eintritt in neue Märkte ausgesetzt sind. Unternehmen, die auf globale Bewertungsdienstleistungen spezialisiert sind, kombinieren daher Bewertungsmodelle und Erfahrungen aus der Vergangenheit mit heterogenen Metriken und Indikatoren, die Einblicke in die Leistung eines Unternehmens geben. Dieser Beitrag veranschaulicht, wie automatisierte Wissensidentifikation, -extraktion und -integration genutzt werden können, um (i) zusätzliche Indikatoren zu ermitteln, die Einblicke in den Erfolg eines Unternehmens in der Produktentwicklung geben und um (ii) arbeitsintensive Datensammelprozesse zur Unternehmensbewertung zu unterstützen.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Kuntschik, Philipp; Hörler, Sandro (2020): Improving Company Valuations with Automated Knowledge Discovery, Extraction and Fusion. English translation of the article: "Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration". Information - Wissenschaft und Praxis 71 (5-6):321-325. Available online at https://arxiv.org/abs/2010.09249, last checked on 18.05.2021

Abstract: Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into a company's performance. This paper illustrates how automated knowledge discovery, extraction and data fusion can be used to (i) obtain additional indicators that provide insights into the success of a company's product development efforts, and (ii) support labor-intensive data curation processes. We apply deep web knowledge acquisition methods to identify and harvest data on clinical trials that is hidden behind proprietary search interfaces and integrate the extracted data into the industry partner's company valuation ontology. In addition, focused Web crawls and shallow semantic parsing yield information on the company's key personnel and respective contact data, notifying domain experts of relevant changes that get then incorporated into the industry partner's company data.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Hörler, Sandro; Hauser, Christian; Havelka, Anina (2020) : Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence In: Chbeir, Richard; Manolopoulos, Yannis; Akerkar, Rajendra; Mizera-Pietraszko, Jolanta (Hg.): Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics: WIMS 2020: Biarritz, France, 30. Juni - 3. Juli: New York, NY, USA: Association for Computing Machinery (ACM), S. 54-62. Available online at doi.org/10.1145/3405962.3405988, last checked on 21.05.2021

Abstract: A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Braşoveanu, Adrian M.P.; Waldvogel, Roger; Odoni, Fabian (2020) : Harvest: An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums: The 20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology: A Hybrid Conference with both Online and Offline Modes: Melbourne, Australia, 14.-17. Dezember

Abstract: Web forums discuss topics of long-term, persisting involvements in domains such as health, mobile software development and online gaming, some of which are of high interest from a research and business perspective. In the medical domain, for example, forums contain information on symptoms, drug side effects and patient discussions that are highly relevant for patient-focused healthcare and drug development. Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more general case of content extraction from web pages and lack key features important for understanding forum content such as the identification of author metadata and information on the thread structure. This paper, therefore, presents a method that determines the XPath of forum posts, eliminating incorrect mergers and splits of the extracted posts that were common in systems from the previous generation. Based on the individual posts further metadata such as authors, forum URL and structure are extracted. We evaluate our approach by creating a gold standard which contains 102 forum pages from 52 different Web forums, and benchmarking against a baseline and competing tools.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Hauser, Christian; Hörler, Sandro; Havelka, Anina (2020): Deep learning and visual tools for analyzing and monitoring integrity risks. 5th SwissText & 16th KONVENS Joint Conference. Online, 23.-25. Juni, 2020. Available online at https://youtu.be/S9Oxw_UlaW0, last checked on 28.05.2021

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Odoni, Fabian; Braşoveanu, Adrian M.P.; Kuntschik, Philipp; Weichselbraun, Albert (2019) : Introducing orbis. An extendable evaluation pipeline for named entity linking performance drill‐down analyses In: Blake, Catherine; Brown, Cecelia (Hg.): 82nd Annual Meeting of The Association for Information Science: Proceedings, 56: ASIS&T 2019: Melbourne, Australia, 19.-23. Oktober: Somerset, NJ, USA: John Wiley & Sons, Ltd, S. 468-471. Available online at doi.org/10.1002/pra2.49, last checked on 21.05.2021

Abstract: Most current evaluation tools are focused solely on benchmarking and comparative evaluations thus only provide aggregated statistics such as precision, recall and F1-measure to assess overall system performance. They do not offer comprehensive analyses up to the level of individual annotations. This paper introduces Orbis, an extendable evaluation pipeline framework developed to allow visual drill-down analyses of individual entities, computed by annotation services, in the context of the text they appear in, in reference to the entities specified in the gold standard.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Rinaldi, Fabio; Kuntschik, Philipp; Gottowik, Jürgen; Leddin, Mathias; Esteban, Raul R.; Weichselbraun, Albert; Ellendorff, Tilia; Colic, Nico; Furrer, Lenz (2019): MedMon: social media analytics for an healthcare application. 4th SwissText Analytics Conference. Winterthur, 18.-19. Juni, 2019. Available online at https://youtu.be/SA61WJ57XAc, last checked on 28.05.2021

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert (2019): Datenakquiseprozesse mittels Big Data optimieren (Einblicke in die Forschung). Available online at https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2019.pdf, last checked on 09.04.2021

Abstract: Im Rahmen des DISCOVER-Projekts werden Methoden für die automatische Datenakquise, die Extraktion und Integration von entscheidungsrelevanter Information aus heterogenen Onlinequellen entwickelt, welche auch in der Lage sind, Inhalte aus dem Deep Web zu analysieren.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Kuntschik, Philipp; Braşoveanu, Adrian M.P. (2019) : Name Variants for Improving Entity Discovery and Linking In: Eskevich, Maria; Melo, Gerard de; Fäth, Christian; McCrae, John P.; Buitelaar, Paul; Chiarcos, Christian; Klimek, Bettina; Dojchinovski, Milan (Hg.): 2nd Conference onLanguage, Data and Knowledge: LDK 2019: Leipzig, 20.-23. Mai: Saarbrücken/Wadern: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing (OASIcs), S. 14:1-14:15. Available online at https://doi.org/10.4230/OASIcs.LDK.2019.14, last checked on 21.05.2021

Abstract: Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content. This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Braşoveanu, Adrian M.P.; Kuntschik, Philipp; Nixon, Lyndon J.B. (2019) : Improving Named Entity Linking Corpora Quality In: Angelova, Galia; Mitkov, Ruslan; Nikolova, Ivelina; Temnikova, Irina (Hg.): Natural Language Processing in a Deep Learning World: Proceedings: International Conference Recent Advances in Natural Language Processing (RANLP 2019): Varna, Bulgaria, 2.-4. September: Bulgaria: Ltd., Shoumen, S. 1328-1337. Available online at https://doi.org/10.26615/978-954-452-056-4_152, last checked on 21.05.2021

Abstract: Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert (2019): Capturing, analyzing and visualizing user generated content from social media. 27th Conference on Intelligent Systems for Molecular Biology (ISMB); 17th European Conference on Computational Biology (ECCB); Special session on Social media mining for drug discovery research: challenges and opportunities of Real World Text. Basel, 21.-25. Juni, 2019

Abstract: Source format variability and noise are major challenges when harvesting content from social media. This presentation discusses methods and abstractions for gathering user generated content from Web pages and social media platforms covering (i) structured content, (ii) platforms that leverage Semantic Web standard such as Microformats, RDFa and JSON-LD, and (iii) semi-structured or even unstructured content that is typically found in Web forums. We then discuss pre-processing and anonymization tasks and outline how the collected content is annotated, aggregated and summarized in a so called contextualized information space. An interactive dashboard provides efficient means for analyzing, browsing and visualizing this information space. The dashboard supports analysts in identifying emerging trends and topics, exploring the lexical, geospatial and relational context of topics and entities such as health conditions, diseases, symptoms and drugs, and performing drill-down analysis to shed light on individual posts and statements that cause the observed effects.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Braşoveanu, Adrian M.P.; Nixon, Lyndon J.B.; Weichselbraun, Albert (2018) : StoryLens: A Multiple Views Corpus for Location and Event Detection In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 30: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Available online at doi.org/10.1145/3227609.3227674, last checked on 21.05.2021

Abstract: The news media landscape tends to focus on long-running narratives. Correctly processing new information, therefore, requires considering multiple lenses when analyzing media content. Traditionally it would have been considered sufficient to extract the topics or entities contained in a text in order to classify it, but today it is important to also look at more sophisticated annotations related to fine-grained geolocation, events, stories and the relations between them. In order to leverage such lenses we propose a new corpus that offers a diverse set of annotations over texts collected from multiple media sources. We also showcase the framework used for creating the corpus, as well as how the information from the various lenses can be used in order to support different use cases in the EU project InVID for verifying the veracity of online video.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Braşoveanu, Adrian M.P.; Rizzo, Giuseppe; Kuntschik, Philipp; Weichselbraun, Albert; Nixon, Lyndon J.B. (2018) : Framing Named Entity Linking Error Types In: Calzolari, Nicoletta; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Hasida, Koiti; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios; Tokunaga, Takenobu (Hg.): Eleventh International Conference on Language Resources and Evaluation: Conference Proceedings. In collaboration with Sara Goggi and Hélène Mazo: LREC '18: Miyazaki, Japan, 7.-12. Mai: Paris: European Language Resources Association (ELRA), S. 266-271. Available online at https://www.aclweb.org/anthology/L18-1040/, last checked on 21.05.2021

Abstract: Named Entity Linking (NEL) and relation extraction forms the backbone of Knowledge Base Population tasks. The recent rise of large open source Knowledge Bases and the continuous focus on improving NEL performance has led to the creation of automated benchmark solutions during the last decade. The benchmarking of NEL systems offers a valuable approach to understand a NEL system’s performance quantitatively. However, an in-depth qualitative analysis that helps improving NEL methods by identifying error causes usually requires a more thorough error analysis. This paper proposes a taxonomy to frame common errors and applies this taxonomy in a survey study to assess the performance of four well-known Named Entity Linking systems on three recent gold standards.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Odoni, Fabian; Kuntschik, Philipp; Braşoveanu, Adrian M.P.; Weichselbraun, Albert (2018): On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance. SEMANTiCS 2018: 14th International Conference on Semantic Systems. In: Procedia Computer Science 137, S. 33-42. Available online at https://doi.org/10.1016/j.procs.2018.09.004, last checked on 21.05.2021

Abstract: Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations. This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools. We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i) improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert (2018): Optimierung von Karriere- und Recruitingprozessen mittels Web Analytics und künstlicher Intelligenz (Einblicke in die Forschung). Available online at https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2018.pdf, last checked on 09.04.2021

Abstract: Maschinelle Verfahren können die gezielte Suche nach qualifizierten Kandidatinnen und Kandidaten, die Analyse von Karriereverläufen sowie Karriereplanungs- und Weiterbildungsprozesse unterstützen.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Kuntschik, Philipp; Braşoveanu, Adrian M.P. (2018) : Mining and Leveraging Background Knowledge for Improving Named Entity Linking In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 27: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Available online at doi.org/10.1145/3227609.3227670, last checked on 21.05.2021

Abstract: Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert (2018): On the convergence of Artificial Intelligence and Big Data. Potential, challenges and impact. Keynote. Graubünden forscht. Academia Raetica. Davos, 19. September, 2018

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Kuntschik, Philipp; Süsstrunk, Norman; Odoni, Fabian; Braşoveanu, Adrian M.P. (2018): Optimizing Information Acquisition and Decision Making Processes with Natural Language Processing, Machine Learning and Visual Analytics. 3rd SwissText Analytics Conference. Winterthur, 12.-13. Juni, 2018. Available online at https://youtu.be/YicWN1rEn7M, last checked on 28.05.2021

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Marx, Edgard; Shekarpour, Saeedeh; Soru, Tommaso; Braşoveanu, Adrian M.P.; Saleem, Muhammad; Baron, Ciro; Weichselbraun, Albert; Lehmann, Jens; Ngomo, Axel-Cyrille Ngonga; Auer, Soren (2017) : Torpedo: Improving the State-of-the-Art RDF Dataset Slicing: 11th International Conference on Semantic Computing: ICSC: San Diego, CA, USA, 30. Januar - 1. Februar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 149-156. Available online at https://doi.org/10.1109/ICSC.2017.79, last checked on 21.05.2021

Abstract: Over the last years, the amount of data published as Linked Data on the Web has grown enormously. In spite of the high availability of Linked Data, organizations still encounter an accessibility challenge while consuming it. This is mostly due to the large size of some of the datasets published as Linked Data. The core observation behind this work is that a subset of these datasets suffices to address the needs of most organizations. In this paper, we introduce Torpedo, an approach for efficiently selecting and extracting relevant subsets from RDF datasets. In particular, Torpedo adds optimization techniques to reduce seek operations costs as well as the support of multi-join graph patterns and SPARQL FILTERs that enable to perform a more granular data selection. We compare the performance of our approach with existing solutions on nine different queries against four datasets. Our results show that our approach is highly scalable and is up to 26% faster than the current state-of-the-art RDF dataset slicing approach.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Scharl, Arno; Herring, David; Rafelsberger, Walter; Hubmann-Haidvogel, Alexander; Kamolov, Ruslan; Fischl, Daniel; Fols, Michael; Weichselbraun, Albert (2017): Semantic Systems and Visual Tools to Support Environmental Communication. In: IEEE Systems Journal 11, S. 762-771. Available online at https://doi.org/10.1109/JSYST.2015.2466439, last checked on 24.07.2020

Abstract: Given the intense attention that environmental topics such as climate change attract in news and social media coverage, scientists and communication professionals want to know how different stakeholders perceive observable threats and policy options, how specific media channels react to new insights, and how journalists present scientific knowledge to the public. This paper investigates the potential of semantic technologies to address these questions. After summarizing methods to extract and disambiguate context information, we present visualization techniques to explore the lexical, geospatial, and relational context of topics and entities referenced in these repositories. The examples stem from the Media Watch on Climate Change, the Climate Resilience Toolkit and the NOAA Media Watch-three applications that aggregate environmental resources from a wide range of online sources. These systems not only show the value of providing comprehensive information to the public, but also have helped to develop a novel communication success metric that goes beyond bipolar assessments of sentiment.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Gindl, Stefan; Fischer, Fabian; Vakulenko, Svitlana; Scharl, Arno (2017): Aspect-Based Extraction and Analysis of Affective Knowledge from Social Media Streams. In: IEEE Intelligent Systems 32, S. 80-88. Available online at doi.org/10.1109/MIS.2017.57, last checked on 18.05.2021

Abstract: Extracting and analyzing affective knowledge from social media in a structured manner is a challenging task. Decision makers require insights into the public perception of a company's products and services, as a strategic feedback channel to guide communication campaigns, and as an early warning system to quickly react in the case of unforeseen events. The approach presented in this article goes beyond bipolar metrics of sentiment. It combines factual and affective knowledge extracted from rich public knowledge bases to analyze emotions expressed toward specific entities (targets) in social media. The authors obtain common and common-sense domain knowledge from DBpedia and ConceptNet to identify potential sentiment targets. They employ affective knowledge about emotional categories available from SenticNet to assess how those targets and their aspects (such as specific product features) are perceived in social media. An evaluation shows the usefulness and correctness of the extracted domain knowledge, which is used in a proof-of-concept data analytics application to investigate the perception of car brands on social media in the period between September and November 2015.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Kuntschik, Philipp (2017) : Mitigating linked data quality issues in knowledge-intense information extraction methods In: Akerkar, Rajendra; Cuzzocrea, Alfredo; Cao, Jannong; Hacid, Mohand-Said (Hg.): Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, Article No.: 17: WIMS '17: Amantea, Italy, 19.-22. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Available online at https://doi.org/10.1145/3102254.3102272, last checked on 21.05.2021

Abstract: Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by introducing a framework that elaborates on linked data quality issues relevant to different stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of different kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Braşoveanu, Adrian M.P.; Nixon, Lyndon J.B.; Weichselbraun, Albert; Scharl, Arno (2016) : A Regional News Corpora for Contextualized Entity Discovery and Linking In: Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Goggi, Sara; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios (Hg.): Tenth International Conference on Language Resources and Evaluation: Conference Proceedings: LREC '16: Portorož, Slovenia, Mai: Paris: European Language Resources Association (ELRA), S. 3333-3338. Available online at https://www.aclweb.org/anthology/L16-1531, last checked on 21.05.2021

Abstract: This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Scharl, Arno; Hubmann-Haidvogel, Alexander; Jones, Alistair; Fischl, Daniel; Kamolov, Ruslan; Weichselbraun, Albert; Rafelsberger, Walter (2016): Analyzing the public discourse on works of fiction. Detection and visualization of emotion in online coverage about HBO's Game of Thrones. In: Information processing & management 52, S. 129-138. Available online at doi.org/10.1016/j.ipm.2015.02.003, last checked on 18.05.2021

Abstract: This paper presents a Web intelligence portal that captures and aggregates news and social media coverage about "Game of Thrones", an American drama television series created for the HBO television network based on George R.R. Martin's series of fantasy novels. The system collects content from the Web sites of Anglo-American news media as well as from four social media platforms: Twitter, Facebook, Google+ and YouTube. An interactive dashboard with trend charts and synchronized visual analytics components not only shows how often Game of Thrones events and characters are being mentioned by journalists and viewers, but also provides a real-time account of concepts that are being associated with the unfolding storyline and each new episode. Positive or negative sentiment is computed automatically, which sheds light on the perception of actors and new plot elements.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Scharl, Arno; Weichselbraun, Albert; Göbel, Max; Rafelsberger, Walter; Kamolov, Ruslan (2016) : Scalable Knowledge Extraction and Visualization for Web Intelligence In: Bui, Tung X.; Sprague, Ralph H. (Hg.): Proceedings of the 49th Annual Hawaii International Conference on System Sciences: 49th Hawaii International Conference on System Sciences (HICSS): Koloa, HI, 5.-8. Januar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 3749-3757. Available online at https://doi.org/10.1109/HICSS.2016.467, last checked on 21.05.2021

Abstract: Understanding stakeholder perceptions and assessing the impact of campaigns are key questions of communication experts. Web intelligence platforms help to answer such questions, provided that they are scalable enough to analyze and visualize information flows from volatile online sources in real time. This paper presents a distributed architecture for aggregating Web content repositories from Web sites and social media streams, memory-efficient methods to extract factual and affective knowledge, and interactive visualization techniques to explore the extracted knowledge. The presented examples stem from the Media Watch on Climate Change, a public Web portal that aggregates environmental content from a range of online sources.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Vakulenko, Svitlana; Weichselbraun, Albert; Scharl, Arno (2016) : Detection of Valid Sentiment-Target Pairs in Online Product Reviews and News Media Articles: IEEE/WIC/ACM International Conference on Web Intelligence: Proceedings: WI: Omaha, NE, USA, 13.-16. Oktober: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 97-104. Available online at https://doi.org/10.1109/WI.2016.0024, last checked on 21.05.2021

Abstract: This paper investigates the linking of sentiments to their respective targets, a sub-task of fine-grained sentiment analysis. Many different features have been proposed for this task, but often without a formal evaluation. We employ a recursive feature elimination approach to identify features that optimize predictive performance. Our experimental evaluation draws upon two corpora of product reviews and news articles annotated with sentiments and their targets. We introduce competitive baselines, outline the performance of the proposed approach, and report the most useful features for sentiment target linking. The results help to better understand how sentiment-target relations are expressed in the syntactic structure of natural language, and how this information can be used to build systems for fine-grained sentiment analysis.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert (2016) : Big Data Technologien für KMUs . Bund fördert Investitionen . Blog (FHGR Blog) . Available online at https://blog.fhgr.ch/blog/big-data-technologien-fuer-kmus-der-bund-foerdert-investitionen/ , last checked on 28.03.2021

Abstract: Daten werden als Treibstoff des einundzwanzigsten Jahrhunderts bezeichnet. Die Fähigkeit interne und externe Daten zu analysieren und für die Entscheidungsfindung, Optimierung von Unternehmensprozessen und die Entwicklung von neuen Produkten heranzuziehen wird in naher Zukunft eine Schlüsselrolle für die Wettbewerbsfähigkeit von Unternehmen und öffentlichen Institutionen spielen.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Scharl, Arno; Gindl, Stefan (2016) : Extracting Opinion Targets from Environmental Web Coverage and Social Media Streams In: Bui, Tung X.; Sprague, Ralph H. (Hg.): Proceedings of the 49th Annual Hawaii International Conference on System Sciences: 49th Hawaii International Conference on System Sciences (HICSS): Koloa, HI, 5.-8. Januar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 1040-1048. Available online at https://doi.org/10.1109/HICSS.2016.133, last checked on 21.05.2021

Abstract: Policy makers and environmental organizations have a keen interest in awareness building and the evolution of stakeholder opinions on environmental issues. Mere polarity detection, as provided by many existing methods, does not suffice to understand the emergence of collective awareness. Methods for extracting affective knowledge should be able to pinpoint opinion targets within a thread. Opinion target extraction provides a more accurate and fine-grained identification of opinions expressed in online media. This paper compares two different approaches for identifying potential opinion targets and applies them to comments from the YouTube video sharing platform. The first approach is based on statistical keyword analysis in conjunction with sentiment classification on the sentence level. The second approach uses dependency parsing to pinpoint the target of an opinionated term. A case study based on YouTube postings applies the developed methods and measures their ability to handle noisy input data from social media streams.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert (2016): Chances and Challenges in Text and Data Mining. IARU Library Meeting in Zurich. Zürich, 9. Juni, 2016

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Dahinden, Urs; Francolino, Vincenzo; Weichselbraun, Albert (2015): Risikokommunikation zum Stromnetzausbau. Ergebnisse einer international vergleichenden Inhaltsanalyse von Massenmedien und Online-Medien. SGKM-Jahrestagung. Schweizerische Gesellschaft für Kommunikations- und Medienwissenschaft. Bern, 14. März, 2015

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Dahinden, Urs; Weichselbraun, Albert; Schuldt, Karsten; Francolino, Vincenzo; Odoni, Fabian (2015): Swiss System for Monitoring bibliographic data and Holistic publication behavior analysis (SYMPHONY): Requirement analysis. Final report of the project SYMPHONY (142-008) in the swissuniversities program: SUC 2013-2016 P-2: „Scientific information: Access, processing and safeguarding“. Chur, Version 1.2, 2. September. Available online at https://www.fhgr.ch/fhgr/angewandte-zukunftstechnologien/schweizerisches-institut-fuer-informationswissenschaft-sii/projekte/symphony/, last checked on 03.07.2020

Abstract: The objective of the “Swiss System for Monitoring bibliographic data and Holistic publication behavior analysis” (SYMPHONY) project was to set up a study that is able to monitor the publication behavior of researchers in Switzerland in systematic and continuing way. Due to the complexity of these tasks and the high number of stakeholders involved, SYMPHONY was conceptualized as a pre-study that identifies and analyses the requirements of the key stakeholders towards such as system. Several methods have been used to reach the project goal: In a first step, a review of the international literature gave valuable insights in the potential, but also the problems associated with current approaches towards monitoring publication behavior by means of bibliometrics (e.g. bias against Open Access publication formats). As a second methodological step, the project team ran a stakeholder dialog that included 40 interviews with key stake-holders and experts in the field (all universities and most universities of applied sciences, a selection of research organizations, funding agencies, bibliometric experts etc.) This stake-holder dialog was necessary in order to take the considerable heterogeneity and decentralizedstructure of the Swiss science system into account. The interview partners were asked about their current practice of measuring the quantity and quality of scientific output with a focus on publication monitoring (technical infrastructure, financial resources, organizational guidelines and processes) and their needs and requirements for a new or adapted infrastructure. The expert interviews have clearly shown that the majority of stakeholders in the Swiss science systems considers the current status quo of bibliographic data collection and publication analysis problematic because a number of scientific disciplines (social sciences and humanities) and a considerable amount of scientific publication formats (e.g. narrow selection of books and book chapters, exclusion of peer reviewed journals that are not included in the dominant bibliometric data base) are not adequately represented in the dominant bibliometric systems (e.g. Web of Science by Thomson Reuters). Based on the findings from the expert interviews, the project team has developed the following four scenarios: (1) maintain status quo, (2) perform targeted studies, (3) create a new infrastructure for monitoring the publication behavior of Swiss scientists, (4) scenario (3) plus a framework for assessing the societal impact of publications, projects and institutions. These scenarios were presented to the experts and stakeholders at the project workshop with the opportunity to comment and to provide feedback. One important result of the workshop was that the participants recommended to focus on scenario 3 for the further project development by aiming at the creation of a new infrastructure with a clearly and narrowly defined task to monitor the publication behavior of Swiss scientists. Based on the feedback from the stakeholder workshop, the project team has developed a revised and detailed version of scenario 3 that was considered as best approach to meet the ambitious goals set by the White Paper. The final chapter list the requirements for the current and future monitoring of scientific publications in Switzerland and gives a preview on the planned follow-up project “SYMPHONY - Proof of concept”.

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Dahinden, Urs; Weichselbraun, Albert (2015): Welche Rolle spielt Open Access in der Forschungsevaluation?. Open Access Tage. Zürich, 8. September, 2015. Available online at https://open-access.net/community/open-access-tage/open-access-tage-2015-zuerich/programm, last checked on 28.05.2021

Export record: Citavi Endnote RIS ISI BibTeX WordXML
Weichselbraun, Albert; Streiff, Daniel; Scharl, Arno (2015): Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence. In: International Journal on Artificial Intelligence Tools 24. Available online at https://doi.org/10.1142/S0218213015400084, last checked on 24.07.2020

Abstract: Linking named entities to structured knowledge sources paves the way for state-of-the-art Web intelligence applications which assign sentiment to the correct entities, identify trends, and reveal relations between organizations, persons and products. For this purpose this paper introduces Recognyze, a named entity linking component that uses background knowledge obtained from linked data repositories, and outlines the process of transforming heterogeneous data silos within an organization into a linked enterprise data repository which draws upon popular linked open data vocabularies to foster interoperability with public data sets. The presented examples use comprehensive real-world data sets from Orell Füssli Business Information, Switzerland's largest business information provider. The linked data repository created from these data sets comprises more than nine million triples on companies, the companies' contact information, key people, products and brands. We identify the major challenges of tapping into such sources for named entity linking, and describe required data pre-processing techniques to use and integrate such data sets, with a special focus on disambiguation and ranking algorithms. Finally, we conduct a comprehensive evaluation based on business news from the New Journal of Zurich and AWP Financial News to illustrate how these techniques improve the performance of the Recognyze named entity linking component.

Export record: Citavi Endnote RIS ISI BibTeX WordXML