Menu
Alle Publikationen
Übersicht

Übersicht

Geben Sie einen Suchbegriff ein oder verwenden Sie die Erweiterte Suche um nach Autor, Erscheinungsjahr oder Dokumenttyp zu filtern.

  • Erweiterte Suche öffnen

  • Kaplan, Himmet; Weichselbraun, Albert; Braşoveanu, Adrian M.P. (2023): Integrating Economic Theory, Domain Knowledge, and Social Knowledge into Hybrid Sentiment Models for Predicting Crude Oil Markets. In: Cognitive Computation, zuletzt geprüft am 31.03.2023

    Abstract: For several decades, sentiment analysis has been considered a key indicator for assessing market mood and predicting future price changes. Accurately predicting commodity markets requires an understanding of fundamental market dynamics such as the interplay between supply and demand, which are not considered in standard affective models. This paper introduces two domain-specific affective models, CrudeBERT and CrudeBERT+, that adapt sentiment analysis to the crude oil market by incorporating economic theory with common knowledge of the mentioned entities and social knowledge extracted from Google Trends. To evaluate the predictive capabilities of these models, comprehensive experiments were conducted using dynamic time warping to identify the model that best approximates WTI crude oil futures price movements. The evaluation included news headlines and crude oil prices between January 2012 and April 2021. The results show that CrudeBERT+ outperformed RavenPack, BERT, FinBERT, and early CrudeBERT models during the 9-year evaluation period and within most of the individual years that were analyzed. The success of the introduced domain-specific affective models demonstrates the potential of integrating economic theory with sentiment analysis and external knowledge sources to improve the predictive power of financial sentiment analysis models. The experiments also confirm that CrudeBERT+ has the potential to provide valuable insights for decision-making in the crude oil market.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Rölke, Heiko; Weichselbraun, Albert (2023) : Ontologien und Linked Open Data In: Kuhlen, Rainer; Lewandowski, Dirk; Semar, Wolfgang; Womser-Hacker, Christa (Hg.): Grundlagen der Informationswissenschaft: 7., völlig neu gefasste Ausgabe: Berlin: De Gruyter, S. 257-269. Online verfügbar unter https://doi.org/10.1515/9783110769043-022, zuletzt geprüft am 16.12.2022

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Beier, Michael; Hauser, Christian; Weichselbraun, Albert (2022): Compliance-Untersuchungen im Zeitalter von Big Data und künstlicher Intelligenz. In: Compliance-Berater 10. Online verfügbar unter https://www.researchgate.net/publication/361276309_Compliance-Untersuchungen_im_Zeitalter_von_Big_Data_und_kunstlicher_Intelligenz, zuletzt geprüft am 23.06.2022

     

    Abstract: Seit mehr als zwei Jahrzehnten werden IT-gestützte Instrumente bei Compliance-Untersuchungen eingesetzt. Dabei haben sich der Anwendungsbereich und die Methoden im Laufe der Zeit erheblich verändert. Einerseits nimmt die Menge der zu bearbeitenden Dokumente, Daten und Datentypen massiv zu. Andererseits werden die technischen Methoden zur Datenbearbeitung immer leistungsstärker. Aktuell stellt sich die Frage, inwieweit es möglich ist, durch neue Technologien aus dem Bereich Big Data und künstlicher Intelligenz (KI) Automatisierungspotenziale zu heben, mit denen Compliance-Untersuchungen besser, schneller und kostengünstiger durchgeführt werden können. Dieser Beitrag zeigt den aktuellen Stand in der Praxis sowie Entwicklungspotenziale in der nahen Zukunft auf.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Jehan, Eleanor; Weichselbraun, Albert (2022): Internal Integrity Risk Warning System. Integrity Fund Meeting. Koenig & Bauer Banknote Solutions. Lausanne, 1. Juli, 2022

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Jehan, Eleanor; Weichselbraun, Albert; Beier, Michael (2022): Whistleblower investigations in the age of Big Data and artificial intelligence. Working Group Meeting. ECS Working Group Whistleblowing. Zürich, 20. Juni, 2022

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Waldvogel, Roger; Fraefel, Andreas; van Schie, Alexander; Süsstrunk, Norman; Kuntschik, Philipp (2022): Slot Filling for Extracting Reskilling and Upskilling Options from the Web. 27th International Conference on Natural Language & Information Systems (NLDB). Universitat Politècnica de València. Valencia,17. Juni, 2022. Online verfügbar unter https://www.youtube.com/watch?v=rIhhKjJAMnY&t=2608s, zuletzt geprüft am 24.11.2022

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Waldvogel, Roger; Fraefel, Andreas; van Schie, Alexander; Kuntschik, Philipp (2022): Building Knowledge Graphs and Recommender Systems for Suggesting Reskilling and Upskilling Options from the Web. In: Information 13. Online verfügbar unter https://doi.org/10.3390/info13110510, zuletzt geprüft am 24.11.2022

     

    Abstract: As advances in science and technology, crisis, and increased competition impact labor markets, reskilling and upskilling programs emerged to mitigate their effects. Since information on continuing education is highly distributed across websites, choosing career paths and suitable upskilling options is currently considered a challenging and cumbersome task. This article, therefore, introduces a method for building a comprehensive knowledge graph from the education providers’ Web pages. We collect educational programs from 488 providers and leverage entity recognition and entity linking methods in conjunction with contextualization to extract knowledge on entities such as prerequisites, skills, learning objectives, and course content. Slot filling then integrates these entities into an extensive knowledge graph that contains close to 74,000 nodes and over 734,000 edges. A recommender system leverages the created graph, and background knowledge on occupations to provide a career path and upskilling suggestions. Finally, we evaluate the knowledge extraction approach on the CareerCoach 2022 gold standard and draw upon domain experts for judging the career paths and upskilling suggestions provided by the recommender system.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; van Schie, Alexander; Fraefel, Andreas; Kuntschik, Philipp; Waldvogel, Roger (2022) : Career Coach. Automatische Wissensextraktion und Expertensystem für personalisierte Re- und Upskilling Vorschläge In: Forster, Michael; Alt, Sharon; Hanselmann, Marcel; Deflorin, Patricia (Hg.): Digitale Transformation an der Fachhochschule Graubünden: Case Studies aus Forschung und Lehre: Chur: FH Graubünden Verlag, S. 11-18

    Abstract: CareerCoach entwickelt Methoden zur automatischen Extraktion von Fortbildungsangeboten. Das System analysiert die Webseiten von Bildungsanbietenden und integriert deren Angebote in einen zentralen Wissensgrafen, der innovative Dienstleistungen wie semantische Suchen und Expertensysteme unterstützt.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Havelka, Anina; Hörler, Sandro; Weichselbraun, Albert (2021) : Towards Developing an Integrity Risk Monitor (IRM). A Status Report In: Makowicz, Bartosz: Global Ethics, Compliance & Integrity: Yearbook 2021: Bern: Peter Lang, S. 123-131

    Abstract: Risks, which could jeopardize the integrity of a company, are widespread. This holds true for firms located in Switzerland too. According to a recent study by PricewaterhouseCoopers (2018), almost 40 percent of Swiss companies have been affected by illegal and unethical behavior, such as embezzlement, cybercrime, intellectual property infringements, corruption, fraud, money laundering, and anti-competitive agreements. Although the number of cases in Switzerland is relatively low when compared to other countries globally, the financial damage for affected Swiss companies caused by these incidents is nevertheless above the global average.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Weichselbraun, Albert; Havelka, Anina; Hörler, Sandro; Waldvogel, Roger (2021): Integrity Risk Monitor. Chur: FH Graubünden Verlag. Online verfügbar unter https://www.fhgr.ch/fhgr/unternehmerisches-handeln/schweizerisches-institut-fuer-entrepreneurship-sife/projekte/integrity-risk-monitor-irm/, zuletzt geprüft am 17.03.2022

     

    Abstract: Integre Unternehmensführung hat in den vergangenen Jahren national und international an Bedeutung gewonnen. So thematisiert die Wirtschaftspresse immer wieder das Verhalten von Unternehmen, die ihrer unternehmerischen Verantwortung nicht gerecht werden. Zugleich verlangen verschiedene Anspruchsgruppen von den Unternehmen mehr Transparenz bzgl. ihrer Aktivitäten. Dies veranlasst die Unternehmen in ihrer nicht-finanziellen Geschäftsberichterstattung über ihre Bemühungen um integres Geschäftsgebaren in den Bereichen Menschenrechte, Umwelt und Anti-Korruption zu berichten. Im Rahmen des Forschungsprojekts Integry Risk Monitor (IRM) wurden das IRM-Portal und das IRM-Dashboard entwickelt. Hierbei handelt es sich um webbasierte Echtzeit-Monitoring-Instrumente. Das IRM-Portal umfasst Medienbeiträge der letzten 25 Jahre aus unterschiedlichen Quellen. Ferner durchforstet der Algorithmus permanent das World Wide Web und sammelt neue Beiträge aus redaktionellen Medien. Diese können mithilfe des IRM-Dashboards mit verschiedenen Analyse- und Darstellungsmöglichkeiten untersucht und Zusammenhänge, Beteiligte, Sentiments und geografische Hauptregionen ermittelt werden. Zudem wurde im Rahmen des Projektes auch die nicht-finanzielle Geschäftsberichterstattung von Unternehmen untersucht, um Beziehungen zwischen der medialen und nicht-finanziellen Berichterstattung zu analysieren. Die Ergebnisse der Untersuchung machen deutlich, dass sowohl die Medien als auch die analysierten Unternehmen in den letzten 25 Jahren mehr über die Themen Menschenrechte, Umwelt und Korruption berichten, vorderhand jedoch kein direkter linearer Zusammenhang zwischen diesen beiden Formen der Berichterstattung besteht.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Weichselbraun, Albert; Jehan, Eleanor; Schmid, Marco (2021): Internal integrity risk warning system (IIRWiS). Integrity Fund Meeting. Koenig & Bauer Banknote Solutions. Online, 29. März, 2021

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Steixner, Jakob; Braşoveanu, Adrian M.P.; Scharl, Arno; Göbel, Max; Nixon, Lyndon J.B. (2021): Automatic Expansion of Domain-Specific Affective Models for Web Intelligence Applications. In: Cognitive Computation. Online verfügbar unter https://doi.org/10.1007/s12559-021-09839-4, zuletzt geprüft am 18.02.2021

     

    Abstract: Sentic computing relies on well-defined affective models of different complexity—polarity to distinguish positive and negative sentiment, for example, or more nuanced models to capture expressions of human emotions. When used to measure communication success, even the most granular affective model combined with sophisticated machine learning approaches may not fully capture an organisation’s strategic positioning goals. Such goals often deviate from the assumptions of standardised affective models. While certain emotions such as Joy and Trust typically represent desirable brand associations, specific communication goals formulated by marketing professionals often go beyond such standard dimensions. For instance, the brand manager of a television show may consider fear or sadness to be desired emotions for its audience. This article introduces expansion techniques for affective models, combining common and commonsense knowledge available in knowledge graphs with language models and affective reasoning, improving coverage and consistency as well as supporting domain-specific interpretations of emotions. An extensive evaluation compares the performance of different expansion techniques: (i) a quantitative evaluation based on the revisited Hourglass of Emotions model to assess performance on complex models that cover multiple affective categories, using manually compiled gold standard data, and (ii) a qualitative evaluation of a domain-specific affective model for television programme brands. The results of these evaluations demonstrate that the introduced techniques support a variety of embeddings and pre-trained models. The paper concludes with a discussion on applying this approach to other scenarios where affective model resources are scarce.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Francolino, Vincenzo; Saner, Mirco; Dahinden, Urs; Wyss, Vinzenz (2021): Adapting Data-Driven Research to the Fields of Social Sciences and the Humanities. In: Future Internet 13. Online verfügbar unter doi.org/10.3390/fi13030059, zuletzt geprüft am 18.05.2021

     

    Abstract: Recent developments in the fields of computer science, such as advances in the areas of big data, knowledge extraction, and deep learning, have triggered the application of data-driven research methods to disciplines such as the social sciences and humanities. This article presents a collaborative, interdisciplinary process for adapting data-driven research to research questions within other disciplines, which considers the methodological background required to obtain a significant impact on the target discipline and guides the systematic collection and formalization of domain knowledge, as well as the selection of appropriate data sources and methods for analyzing, visualizing, and interpreting the results. Finally, we present a case study that applies the described process to the domain of communication science by creating approaches that aid domain experts in locating, tracking, analyzing, and, finally, better understanding the dynamics of media criticism. The study clearly demonstrates the potential of the presented method, but also shows that data-driven research approaches require a tighter integration with the methodological framework of the target discipline to really provide a significant impact on the target discipline.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2021): Inscriptis: A Python-based HTML to text conversion library optimized for knowledge extraction from the Web. In: Journal of Open Source Software 6. Online verfügbar unter https://doi.org/10.21105/joss.03557, zuletzt geprüft am 22.10.2021

     

    Abstract: Inscriptis provides a library, command line client and Web service for converting HTML to plain text. Its development has been triggered by the need to obtain accurate text representations for knowledge extraction tasks that preserve the spatial alignment of text without drawing upon heavyweight, browser-based solutions such as Selenium (Huggins et al., 2021). In contrast to existing software packages such as HTML2text (Swartz, 2021), jusText (Belica, 2021) and Lynx (Dickey, 2021), Inscriptis 1. provides a layout-aware conversion of HTML that more closely resembles the rendering obtained from standard Web browsers and, therefore, better preserves the spatial arrangement of text elements. Inscriptis excels in terms of conversion quality, since it correctly converts complex HTML constructs such as nested tables and also interprets a subset of HTML (e.g., align, valign) and CSS (e.g., display, white-space, margin-top, vertical-align, etc.) attributes that determine the text alignment. 2. supports annotation rules, i.e., user-provided mappings that allow for annotating the extracted text based on structural and semantic information encoded in HTML tags and attributes used for controlling structure and layout in the original HTML document. These unique features ensure that downstream knowledge extraction components can operate on accurate text representations, and may even use information on the semantics and structure of the original HTML document, if annotation support has been enabled.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Weichselbraun, Albert; Nixon, Lyndon J.B. (2020) : In Media Res: A Corpus for Evaluating Named Entity Linking with Creative Works In: Fernández, Raquel; Linzen, Tal (Hg.): Proceedings of the 24th Conference on Computational Natural Language Learning: CoNLL 2020: Online, 19.-20. November: Stroudsburg, PA, USA: Association for Computational Linguistics, S. 355-364. Online verfügbar unter doi.org/10.18653/v1/2020.conll-1.28, zuletzt geprüft am 21.05.2021

     

    Abstract: Annotation styles express guidelines that direct human annotators in what rules to follow when creating gold standard annotations of text corpora. These guidelines not only shape the gold standards they help create, but also influence the training and evaluation of Named Entity Linking (NEL) tools, since different annotation styles correspond to divergent views on the entities present in the same texts. Such divergence is particularly present in texts from the media domain that contain references to creative works. In this work we present a corpus of 1000 annotated documents selected from the media domain. Each document is presented with multiple gold standard annotations representing various annotation styles. This corpus is used to evaluate a series of Named Entity Linking tools in order to understand the impact of the differences in annotation styles on the reported accuracy when processing highly ambiguous entities such as names of creative works. Relaxed annotation guidelines that include overlap styles lead to better results across all tools.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Hörler, Sandro; Weichselbraun, Albert (2020): Development and publication of the Integrity Risk Monitor (IRM). Integrity Fund. Meeting of the project managers. Koenig & Bauer Banknote Solutions. Lausanne, 22. Januar, 2020

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hauser, Christian; Weichselbraun, Albert (2020): Applications of Deep Learning in Integrity Management. Integrity Fund. Board Meeting. Koenig & Bauer Banknote Solutions. Online, 14. Dezember, 2020

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Hörler, Sandro (2020): Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration. In: Information. Wissenschaft & Praxis 71, S. 321-325. Online verfügbar unter https://doi.org/10.1515/iwp-2020-2119, zuletzt geprüft am 30.10.2020

     

    Abstract: Unternehmensbewertungen in der Biotech-Branche, Pharmazie und Medizintechnik stellen eine anspruchsvolle Aufgabe dar, insbesondere bei Berücksichtigung der einzigartigen Risiken, denen Biotech-Startups beim Eintritt in neue Märkte ausgesetzt sind. Unternehmen, die auf globale Bewertungsdienstleistungen spezialisiert sind, kombinieren daher Bewertungsmodelle und Erfahrungen aus der Vergangenheit mit heterogenen Metriken und Indikatoren, die Einblicke in die Leistung eines Unternehmens geben. Dieser Beitrag veranschaulicht, wie automatisierte Wissensidentifikation, -extraktion und -integration genutzt werden können, um (i) zusätzliche Indikatoren zu ermitteln, die Einblicke in den Erfolg eines Unternehmens in der Produktentwicklung geben und um (ii) arbeitsintensive Datensammelprozesse zur Unternehmensbewertung zu unterstützen.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Hörler, Sandro (2020): Improving Company Valuations with Automated Knowledge Discovery, Extraction and Fusion. English translation of the article: "Optimierung von Unternehmensbewertungen durch automatisierte Wissensidentifikation, -extraktion und -integration". Information - Wissenschaft und Praxis 71 (5-6):321-325. Online verfügbar unter https://arxiv.org/abs/2010.09249, zuletzt geprüft am 18.05.2021

     

    Abstract: Performing company valuations within the domain of biotechnology, pharmacy and medical technology is a challenging task, especially when considering the unique set of risks biotech start-ups face when entering new markets. Companies specialized in global valuation services, therefore, combine valuation models and past experience with heterogeneous metrics and indicators that provide insights into a company's performance. This paper illustrates how automated knowledge discovery, extraction and data fusion can be used to (i) obtain additional indicators that provide insights into the success of a company's product development efforts, and (ii) support labor-intensive data curation processes. We apply deep web knowledge acquisition methods to identify and harvest data on clinical trials that is hidden behind proprietary search interfaces and integrate the extracted data into the industry partner's company valuation ontology. In addition, focused Web crawls and shallow semantic parsing yield information on the company's key personnel and respective contact data, notifying domain experts of relevant changes that get then incorporated into the industry partner's company data.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Hörler, Sandro; Hauser, Christian; Havelka, Anina (2020) : Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence In: Chbeir, Richard; Manolopoulos, Yannis; Akerkar, Rajendra; Mizera-Pietraszko, Jolanta (Hg.): Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics: WIMS 2020: Biarritz, France, 30. Juni - 3. Juli: New York, NY, USA: Association for Computing Machinery (ACM), S. 54-62. Online verfügbar unter doi.org/10.1145/3405962.3405988, zuletzt geprüft am 21.05.2021

     

    Abstract: A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Braşoveanu, Adrian M.P.; Waldvogel, Roger; Odoni, Fabian (2020) : Harvest: An Open Source Toolkit for Extracting Posts and Post Metadata from Web Forums: The 20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology: A Hybrid Conference with both Online and Offline Modes: Melbourne, Australia, 14.-17. Dezember

    Abstract: Web forums discuss topics of long-term, persisting involvements in domains such as health, mobile software development and online gaming, some of which are of high interest from a research and business perspective. In the medical domain, for example, forums contain information on symptoms, drug side effects and patient discussions that are highly relevant for patient-focused healthcare and drug development. Automatic extraction of forum posts and metadata is a crucial but challenging task since forums do not expose their content in a standardized structure. Content extraction methods, therefore, often need customizations such as adaptations to page templates and improvements of their extraction code before they can be deployed to new forums. Most of the current solutions are also built for the more general case of content extraction from web pages and lack key features important for understanding forum content such as the identification of author metadata and information on the thread structure. This paper, therefore, presents a method that determines the XPath of forum posts, eliminating incorrect mergers and splits of the extracted posts that were common in systems from the previous generation. Based on the individual posts further metadata such as authors, forum URL and structure are extracted. We evaluate our approach by creating a gold standard which contains 102 forum pages from 52 different Web forums, and benchmarking against a baseline and competing tools.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Hauser, Christian; Hörler, Sandro; Havelka, Anina (2020): Deep learning and visual tools for analyzing and monitoring integrity risks. 5th SwissText & 16th KONVENS Joint Conference. Online, 23.-25. Juni, 2020. Online verfügbar unter https://youtu.be/S9Oxw_UlaW0, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Odoni, Fabian; Braşoveanu, Adrian M.P.; Kuntschik, Philipp; Weichselbraun, Albert (2019) : Introducing orbis. An extendable evaluation pipeline for named entity linking performance drill‐down analyses In: Blake, Catherine; Brown, Cecelia (Hg.): 82nd Annual Meeting of The Association for Information Science: Proceedings, 56: ASIS&T 2019: Melbourne, Australia, 19.-23. Oktober: Somerset, NJ, USA: John Wiley & Sons, Ltd, S. 468-471. Online verfügbar unter doi.org/10.1002/pra2.49, zuletzt geprüft am 21.05.2021

     

    Abstract: Most current evaluation tools are focused solely on benchmarking and comparative evaluations thus only provide aggregated statistics such as precision, recall and F1-measure to assess overall system performance. They do not offer comprehensive analyses up to the level of individual annotations. This paper introduces Orbis, an extendable evaluation pipeline framework developed to allow visual drill-down analyses of individual entities, computed by annotation services, in the context of the text they appear in, in reference to the entities specified in the gold standard.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Rinaldi, Fabio; Kuntschik, Philipp; Gottowik, Jürgen; Leddin, Mathias; Esteban, Raul R.; Weichselbraun, Albert; Ellendorff, Tilia; Colic, Nico; Furrer, Lenz (2019): MedMon: social media analytics for an healthcare application. 4th SwissText Analytics Conference. Winterthur, 18.-19. Juni, 2019. Online verfügbar unter https://youtu.be/SA61WJ57XAc, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2019): Datenakquiseprozesse mittels Big Data optimieren (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2019.pdf, zuletzt geprüft am 09.04.2021

     

    Abstract: Im Rahmen des DISCOVER-Projekts werden Methoden für die automatische Datenakquise, die Extraktion und Integration von entscheidungsrelevanter Information aus heterogenen Onlinequellen entwickelt, welche auch in der Lage sind, Inhalte aus dem Deep Web zu analysieren.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Braşoveanu, Adrian M.P. (2019) : Name Variants for Improving Entity Discovery and Linking In: Eskevich, Maria; Melo, Gerard de; Fäth, Christian; McCrae, John P.; Buitelaar, Paul; Chiarcos, Christian; Klimek, Bettina; Dojchinovski, Milan (Hg.): 2nd Conference onLanguage, Data and Knowledge: LDK 2019: Leipzig, 20.-23. Mai: Saarbrücken/Wadern: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing (OASIcs), S. 14:1-14:15. Online verfügbar unter https://doi.org/10.4230/OASIcs.LDK.2019.14, zuletzt geprüft am 21.05.2021

     

    Abstract: Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content. This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Braşoveanu, Adrian M.P.; Kuntschik, Philipp; Nixon, Lyndon J.B. (2019) : Improving Named Entity Linking Corpora Quality In: Angelova, Galia; Mitkov, Ruslan; Nikolova, Ivelina; Temnikova, Irina (Hg.): Natural Language Processing in a Deep Learning World: Proceedings: International Conference Recent Advances in Natural Language Processing (RANLP 2019): Varna, Bulgaria, 2.-4. September: Bulgaria: Ltd., Shoumen, S. 1328-1337. Online verfügbar unter https://doi.org/10.26615/978-954-452-056-4_152, zuletzt geprüft am 21.05.2021

     

    Abstract: Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2019): Capturing, analyzing and visualizing user generated content from social media. 27th Conference on Intelligent Systems for Molecular Biology (ISMB); 17th European Conference on Computational Biology (ECCB); Special session on Social media mining for drug discovery research: challenges and opportunities of Real World Text. Basel, 21.-25. Juni, 2019

     

    Abstract: Source format variability and noise are major challenges when harvesting content from social media. This presentation discusses methods and abstractions for gathering user generated content from Web pages and social media platforms covering (i) structured content, (ii) platforms that leverage Semantic Web standard such as Microformats, RDFa and JSON-LD, and (iii) semi-structured or even unstructured content that is typically found in Web forums. We then discuss pre-processing and anonymization tasks and outline how the collected content is annotated, aggregated and summarized in a so called contextualized information space. An interactive dashboard provides efficient means for analyzing, browsing and visualizing this information space. The dashboard supports analysts in identifying emerging trends and topics, exploring the lexical, geospatial and relational context of topics and entities such as health conditions, diseases, symptoms and drugs, and performing drill-down analysis to shed light on individual posts and statements that cause the observed effects.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Nixon, Lyndon J.B.; Weichselbraun, Albert (2018) : StoryLens: A Multiple Views Corpus for Location and Event Detection In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 30: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter doi.org/10.1145/3227609.3227674, zuletzt geprüft am 21.05.2021

     

    Abstract: The news media landscape tends to focus on long-running narratives. Correctly processing new information, therefore, requires considering multiple lenses when analyzing media content. Traditionally it would have been considered sufficient to extract the topics or entities contained in a text in order to classify it, but today it is important to also look at more sophisticated annotations related to fine-grained geolocation, events, stories and the relations between them. In order to leverage such lenses we propose a new corpus that offers a diverse set of annotations over texts collected from multiple media sources. We also showcase the framework used for creating the corpus, as well as how the information from the various lenses can be used in order to support different use cases in the EU project InVID for verifying the veracity of online video.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Rizzo, Giuseppe; Kuntschik, Philipp; Weichselbraun, Albert; Nixon, Lyndon J.B. (2018) : Framing Named Entity Linking Error Types In: Calzolari, Nicoletta; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Hasida, Koiti; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios; Tokunaga, Takenobu (Hg.): Eleventh International Conference on Language Resources and Evaluation: Conference Proceedings. Unter Mitarbeit von Sara Goggi und Hélène Mazo: LREC '18: Miyazaki, Japan, 7.-12. Mai: Paris: European Language Resources Association (ELRA), S. 266-271. Online verfügbar unter https://www.aclweb.org/anthology/L18-1040/, zuletzt geprüft am 21.05.2021

     

    Abstract: Named Entity Linking (NEL) and relation extraction forms the backbone of Knowledge Base Population tasks. The recent rise of large open source Knowledge Bases and the continuous focus on improving NEL performance has led to the creation of automated benchmark solutions during the last decade. The benchmarking of NEL systems offers a valuable approach to understand a NEL system’s performance quantitatively. However, an in-depth qualitative analysis that helps improving NEL methods by identifying error causes usually requires a more thorough error analysis. This paper proposes a taxonomy to frame common errors and applies this taxonomy in a survey study to assess the performance of four well-known Named Entity Linking systems on three recent gold standards.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Odoni, Fabian; Kuntschik, Philipp; Braşoveanu, Adrian M.P.; Weichselbraun, Albert (2018): On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance. SEMANTiCS 2018: 14th International Conference on Semantic Systems. In: Procedia Computer Science 137, S. 33-42. Online verfügbar unter https://doi.org/10.1016/j.procs.2018.09.004, zuletzt geprüft am 21.05.2021

     

    Abstract: Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations. This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools. We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i) improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2018): Optimierung von Karriere- und Recruitingprozessen mittels Web Analytics und künstlicher Intelligenz (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2018.pdf, zuletzt geprüft am 09.04.2021

     

    Abstract: Maschinelle Verfahren können die gezielte Suche nach qualifizierten Kandidatinnen und Kandidaten, die Analyse von Karriereverläufen sowie Karriereplanungs- und Weiterbildungsprozesse unterstützen.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Braşoveanu, Adrian M.P. (2018) : Mining and Leveraging Background Knowledge for Improving Named Entity Linking In: Akerkar, Rajendra; Ivanović, Mirjana; Kim, Sang-Wook; Manolopoulos, Yannis; Rosati, Riccardo; Savić, Miloš; Badica, Costin; Radovanović, Miloš (Hg.): Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, Article No.: 27: WIMS '18: Novi Sad, Serbia, 25.-27. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter doi.org/10.1145/3227609.3227670, zuletzt geprüft am 21.05.2021

     

    Abstract: Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2018): On the convergence of Artificial Intelligence and Big Data. Potential, challenges and impact. Keynote. Graubünden forscht. Academia Raetica. Davos, 19. September, 2018

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp; Süsstrunk, Norman; Odoni, Fabian; Braşoveanu, Adrian M.P. (2018): Optimizing Information Acquisition and Decision Making Processes with Natural Language Processing, Machine Learning and Visual Analytics. 3rd SwissText Analytics Conference. Winterthur, 12.-13. Juni, 2018. Online verfügbar unter https://youtu.be/YicWN1rEn7M, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Marx, Edgard; Shekarpour, Saeedeh; Soru, Tommaso; Braşoveanu, Adrian M.P.; Saleem, Muhammad; Baron, Ciro; Weichselbraun, Albert; Lehmann, Jens; Ngomo, Axel-Cyrille Ngonga; Auer, Soren (2017) : Torpedo: Improving the State-of-the-Art RDF Dataset Slicing: 11th International Conference on Semantic Computing: ICSC: San Diego, CA, USA, 30. Januar - 1. Februar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 149-156. Online verfügbar unter https://doi.org/10.1109/ICSC.2017.79, zuletzt geprüft am 21.05.2021

     

    Abstract: Over the last years, the amount of data published as Linked Data on the Web has grown enormously. In spite of the high availability of Linked Data, organizations still encounter an accessibility challenge while consuming it. This is mostly due to the large size of some of the datasets published as Linked Data. The core observation behind this work is that a subset of these datasets suffices to address the needs of most organizations. In this paper, we introduce Torpedo, an approach for efficiently selecting and extracting relevant subsets from RDF datasets. In particular, Torpedo adds optimization techniques to reduce seek operations costs as well as the support of multi-join graph patterns and SPARQL FILTERs that enable to perform a more granular data selection. We compare the performance of our approach with existing solutions on nine different queries against four datasets. Our results show that our approach is highly scalable and is up to 26% faster than the current state-of-the-art RDF dataset slicing approach.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Herring, David; Rafelsberger, Walter; Hubmann-Haidvogel, Alexander; Kamolov, Ruslan; Fischl, Daniel; Fols, Michael; Weichselbraun, Albert (2017): Semantic Systems and Visual Tools to Support Environmental Communication. In: IEEE Systems Journal 11, S. 762-771. Online verfügbar unter https://doi.org/10.1109/JSYST.2015.2466439, zuletzt geprüft am 24.07.2020

     

    Abstract: Given the intense attention that environmental topics such as climate change attract in news and social media coverage, scientists and communication professionals want to know how different stakeholders perceive observable threats and policy options, how specific media channels react to new insights, and how journalists present scientific knowledge to the public. This paper investigates the potential of semantic technologies to address these questions. After summarizing methods to extract and disambiguate context information, we present visualization techniques to explore the lexical, geospatial, and relational context of topics and entities referenced in these repositories. The examples stem from the Media Watch on Climate Change, the Climate Resilience Toolkit and the NOAA Media Watch-three applications that aggregate environmental resources from a wide range of online sources. These systems not only show the value of providing comprehensive information to the public, but also have helped to develop a novel communication success metric that goes beyond bipolar assessments of sentiment.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Gindl, Stefan; Fischer, Fabian; Vakulenko, Svitlana; Scharl, Arno (2017): Aspect-Based Extraction and Analysis of Affective Knowledge from Social Media Streams. In: IEEE Intelligent Systems 32, S. 80-88. Online verfügbar unter doi.org/10.1109/MIS.2017.57, zuletzt geprüft am 18.05.2021

     

    Abstract: Extracting and analyzing affective knowledge from social media in a structured manner is a challenging task. Decision makers require insights into the public perception of a company's products and services, as a strategic feedback channel to guide communication campaigns, and as an early warning system to quickly react in the case of unforeseen events. The approach presented in this article goes beyond bipolar metrics of sentiment. It combines factual and affective knowledge extracted from rich public knowledge bases to analyze emotions expressed toward specific entities (targets) in social media. The authors obtain common and common-sense domain knowledge from DBpedia and ConceptNet to identify potential sentiment targets. They employ affective knowledge about emotional categories available from SenticNet to assess how those targets and their aspects (such as specific product features) are perceived in social media. An evaluation shows the usefulness and correctness of the extracted domain knowledge, which is used in a proof-of-concept data analytics application to investigate the perception of car brands on social media in the period between September and November 2015.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Kuntschik, Philipp (2017) : Mitigating linked data quality issues in knowledge-intense information extraction methods In: Akerkar, Rajendra; Cuzzocrea, Alfredo; Cao, Jannong; Hacid, Mohand-Said (Hg.): Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, Article No.: 17: WIMS '17: Amantea, Italy, 19.-22. Juni: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter https://doi.org/10.1145/3102254.3102272, zuletzt geprüft am 21.05.2021

     

    Abstract: Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by introducing a framework that elaborates on linked data quality issues relevant to different stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of different kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Braşoveanu, Adrian M.P.; Nixon, Lyndon J.B.; Weichselbraun, Albert; Scharl, Arno (2016) : A Regional News Corpora for Contextualized Entity Discovery and Linking In: Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Goggi, Sara; Grobelnik, Marko; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios (Hg.): Tenth International Conference on Language Resources and Evaluation: Conference Proceedings: LREC '16: Portorož, Slovenia, Mai: Paris: European Language Resources Association (ELRA), S. 3333-3338. Online verfügbar unter https://www.aclweb.org/anthology/L16-1531, zuletzt geprüft am 21.05.2021

     

    Abstract: This paper presents a German corpus for Named Entity Linking (NEL) and Knowledge Base Population (KBP) tasks. We describe the annotation guideline, the annotation process, NIL clustering techniques and conversion to popular NEL formats such as NIF and TAC that have been used to construct this corpus based on news transcripts from the German regional broadcaster RBB (Rundfunk Berlin Brandenburg). Since creating such language resources requires significant effort, the paper also discusses how to derive additional evaluation resources for tasks like named entity contextualization or ontology enrichment by exploiting the links between named entities from the annotated corpus. The paper concludes with an evaluation that shows how several well-known NEL tools perform on the corpus, a discussion of the evaluation results, and with suggestions on how to keep evaluation corpora and datasets up to date.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Hubmann-Haidvogel, Alexander; Jones, Alistair; Fischl, Daniel; Kamolov, Ruslan; Weichselbraun, Albert; Rafelsberger, Walter (2016): Analyzing the public discourse on works of fiction. Detection and visualization of emotion in online coverage about HBO's Game of Thrones. In: Information processing & management 52, S. 129-138. Online verfügbar unter doi.org/10.1016/j.ipm.2015.02.003, zuletzt geprüft am 18.05.2021

     

    Abstract: This paper presents a Web intelligence portal that captures and aggregates news and social media coverage about "Game of Thrones", an American drama television series created for the HBO television network based on George R.R. Martin's series of fantasy novels. The system collects content from the Web sites of Anglo-American news media as well as from four social media platforms: Twitter, Facebook, Google+ and YouTube. An interactive dashboard with trend charts and synchronized visual analytics components not only shows how often Game of Thrones events and characters are being mentioned by journalists and viewers, but also provides a real-time account of concepts that are being associated with the unfolding storyline and each new episode. Positive or negative sentiment is computed automatically, which sheds light on the perception of actors and new plot elements.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Weichselbraun, Albert; Göbel, Max; Rafelsberger, Walter; Kamolov, Ruslan (2016) : Scalable Knowledge Extraction and Visualization for Web Intelligence In: Bui, Tung X.; Sprague, Ralph H. (Hg.): Proceedings of the 49th Annual Hawaii International Conference on System Sciences: 49th Hawaii International Conference on System Sciences (HICSS): Koloa, HI, 5.-8. Januar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 3749-3757. Online verfügbar unter https://doi.org/10.1109/HICSS.2016.467, zuletzt geprüft am 21.05.2021

     

    Abstract: Understanding stakeholder perceptions and assessing the impact of campaigns are key questions of communication experts. Web intelligence platforms help to answer such questions, provided that they are scalable enough to analyze and visualize information flows from volatile online sources in real time. This paper presents a distributed architecture for aggregating Web content repositories from Web sites and social media streams, memory-efficient methods to extract factual and affective knowledge, and interactive visualization techniques to explore the extracted knowledge. The presented examples stem from the Media Watch on Climate Change, a public Web portal that aggregates environmental content from a range of online sources.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Vakulenko, Svitlana; Weichselbraun, Albert; Scharl, Arno (2016) : Detection of Valid Sentiment-Target Pairs in Online Product Reviews and News Media Articles: IEEE/WIC/ACM International Conference on Web Intelligence: Proceedings: WI: Omaha, NE, USA, 13.-16. Oktober: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 97-104. Online verfügbar unter https://doi.org/10.1109/WI.2016.0024, zuletzt geprüft am 21.05.2021

     

    Abstract: This paper investigates the linking of sentiments to their respective targets, a sub-task of fine-grained sentiment analysis. Many different features have been proposed for this task, but often without a formal evaluation. We employ a recursive feature elimination approach to identify features that optimize predictive performance. Our experimental evaluation draws upon two corpora of product reviews and news articles annotated with sentiments and their targets. We introduce competitive baselines, outline the performance of the proposed approach, and report the most useful features for sentiment target linking. The results help to better understand how sentiment-target relations are expressed in the syntactic structure of natural language, and how this information can be used to build systems for fine-grained sentiment analysis.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2016) : Big Data Tech­no­lo­gi­en für KMUs . Bund fördert In­ves­ti­tio­nen . Blog (FHGR Blog) . Online verfügbar unter https://blog.fhgr.ch/blog/big-data-technologien-fuer-kmus-der-bund-foerdert-investitionen/ , zuletzt geprüft am 28.03.2021

     

    Abstract: Daten werden als Treibstoff des einundzwanzigsten Jahrhunderts bezeichnet. Die Fähigkeit interne und externe Daten zu analysieren und für die Entscheidungsfindung, Optimierung von Unternehmensprozessen und die Entwicklung von neuen Produkten heranzuziehen wird in naher Zukunft eine Schlüsselrolle für die Wettbewerbsfähigkeit von Unternehmen und öffentlichen Institutionen spielen.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Scharl, Arno; Gindl, Stefan (2016) : Extracting Opinion Targets from Environmental Web Coverage and Social Media Streams In: Bui, Tung X.; Sprague, Ralph H. (Hg.): Proceedings of the 49th Annual Hawaii International Conference on System Sciences: 49th Hawaii International Conference on System Sciences (HICSS): Koloa, HI, 5.-8. Januar: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 1040-1048. Online verfügbar unter https://doi.org/10.1109/HICSS.2016.133, zuletzt geprüft am 21.05.2021

     

    Abstract: Policy makers and environmental organizations have a keen interest in awareness building and the evolution of stakeholder opinions on environmental issues. Mere polarity detection, as provided by many existing methods, does not suffice to understand the emergence of collective awareness. Methods for extracting affective knowledge should be able to pinpoint opinion targets within a thread. Opinion target extraction provides a more accurate and fine-grained identification of opinions expressed in online media. This paper compares two different approaches for identifying potential opinion targets and applies them to comments from the YouTube video sharing platform. The first approach is based on statistical keyword analysis in conjunction with sentiment classification on the sentence level. The second approach uses dependency parsing to pinpoint the target of an opinionated term. A case study based on YouTube postings applies the developed methods and measures their ability to handle noisy input data from social media streams.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2016): Chances and Challenges in Text and Data Mining. IARU Library Meeting in Zurich. Zürich, 9. Juni, 2016

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Dahinden, Urs; Francolino, Vincenzo; Weichselbraun, Albert (2015): Risikokommunikation zum Stromnetzausbau. Ergebnisse einer international vergleichenden Inhaltsanalyse von Massenmedien und Online-Medien. SGKM-Jahrestagung. Schweizerische Gesellschaft für Kommunikations- und Medienwissenschaft. Bern, 14. März, 2015

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Dahinden, Urs; Weichselbraun, Albert; Schuldt, Karsten; Francolino, Vincenzo; Odoni, Fabian (2015): Swiss System for Monitoring bibliographic data and Holistic publication behavior analysis (SYMPHONY): Requirement analysis. Final report of the project SYMPHONY (142-008) in the swissuniversities program: SUC 2013-2016 P-2: „Scientific information: Access, processing and safeguarding“. Chur, Version 1.2, 2. September. Online verfügbar unter https://www.fhgr.ch/fhgr/angewandte-zukunftstechnologien/schweizerisches-institut-fuer-informationswissenschaft-sii/projekte/symphony/, zuletzt geprüft am 03.07.2020

     

    Abstract: The objective of the “Swiss System for Monitoring bibliographic data and Holistic publication behavior analysis” (SYMPHONY) project was to set up a study that is able to monitor the publication behavior of researchers in Switzerland in systematic and continuing way. Due to the complexity of these tasks and the high number of stakeholders involved, SYMPHONY was conceptualized as a pre-study that identifies and analyses the requirements of the key stakeholders towards such as system. Several methods have been used to reach the project goal: In a first step, a review of the international literature gave valuable insights in the potential, but also the problems associated with current approaches towards monitoring publication behavior by means of bibliometrics (e.g. bias against Open Access publication formats). As a second methodological step, the project team ran a stakeholder dialog that included 40 interviews with key stake-holders and experts in the field (all universities and most universities of applied sciences, a selection of research organizations, funding agencies, bibliometric experts etc.) This stake-holder dialog was necessary in order to take the considerable heterogeneity and decentralizedstructure of the Swiss science system into account. The interview partners were asked about their current practice of measuring the quantity and quality of scientific output with a focus on publication monitoring (technical infrastructure, financial resources, organizational guidelines and processes) and their needs and requirements for a new or adapted infrastructure. The expert interviews have clearly shown that the majority of stakeholders in the Swiss science systems considers the current status quo of bibliographic data collection and publication analysis problematic because a number of scientific disciplines (social sciences and humanities) and a considerable amount of scientific publication formats (e.g. narrow selection of books and book chapters, exclusion of peer reviewed journals that are not included in the dominant bibliometric data base) are not adequately represented in the dominant bibliometric systems (e.g. Web of Science by Thomson Reuters). Based on the findings from the expert interviews, the project team has developed the following four scenarios: (1) maintain status quo, (2) perform targeted studies, (3) create a new infrastructure for monitoring the publication behavior of Swiss scientists, (4) scenario (3) plus a framework for assessing the societal impact of publications, projects and institutions. These scenarios were presented to the experts and stakeholders at the project workshop with the opportunity to comment and to provide feedback. One important result of the workshop was that the participants recommended to focus on scenario 3 for the further project development by aiming at the creation of a new infrastructure with a clearly and narrowly defined task to monitor the publication behavior of Swiss scientists. Based on the feedback from the stakeholder workshop, the project team has developed a revised and detailed version of scenario 3 that was considered as best approach to meet the ambitious goals set by the White Paper. The final chapter list the requirements for the current and future monitoring of scientific publications in Switzerland and gives a preview on the planned follow-up project “SYMPHONY - Proof of concept”.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Dahinden, Urs; Weichselbraun, Albert (2015): Welche Rolle spielt Open Access in der Forschungsevaluation?. Open Access Tage. Zürich, 8. September, 2015. Online verfügbar unter https://open-access.net/community/open-access-tage/open-access-tage-2015-zuerich/programm, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Streiff, Daniel; Scharl, Arno (2015): Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence. In: International Journal on Artificial Intelligence Tools 24. Online verfügbar unter https://doi.org/10.1142/S0218213015400084, zuletzt geprüft am 24.07.2020

     

    Abstract: Linking named entities to structured knowledge sources paves the way for state-of-the-art Web intelligence applications which assign sentiment to the correct entities, identify trends, and reveal relations between organizations, persons and products. For this purpose this paper introduces Recognyze, a named entity linking component that uses background knowledge obtained from linked data repositories, and outlines the process of transforming heterogeneous data silos within an organization into a linked enterprise data repository which draws upon popular linked open data vocabularies to foster interoperability with public data sets. The presented examples use comprehensive real-world data sets from Orell Füssli Business Information, Switzerland's largest business information provider. The linked data repository created from these data sets comprises more than nine million triples on companies, the companies' contact information, key people, products and brands. We identify the major challenges of tapping into such sources for named entity linking, and describe required data pre-processing techniques to use and integrate such data sets, with a special focus on disambiguation and ranking algorithms. Finally, we conduct a comprehensive evaluation based on business news from the New Journal of Zurich and AWP Financial News to illustrate how these techniques improve the performance of the Recognyze named entity linking component.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Süsstrunk, Norman (2015) : Optimizing Dependency Parsing Throughput In: Fred, Ana; Dietz, Jan; Aveiro, David; Liu, Kecheng; Filipe, Joaquim (Hg.): Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 1: Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: Lissabon, 12. - 14. November: 3, S. 511-516. Online verfügbar unter 10.5220/0005638905110516, zuletzt geprüft am 28.02.2020

     

    Abstract: Dependency parsing is considered a key technology for improving information extraction tasks. Research indicates that dependency parsers spend more than 95% of their total runtime on feature computations. Based on this insight, this paper investigates the potential of improving parsing throughput by designing feature representations which are optimized for combining single features to more complex feature templates and by optimizing parser constraints. Applying these techniques to MDParser increased its throughput four fold, yielding Syntactic Parser, a dependency parser that outperforms comparable approaches by factor 25 to 400.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2015): IMAGINE: Cross-modal information extraction for improved image meta data (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2015.pdf, zuletzt geprüft am 09.04.2021

     

    Abstract: Die Auffindbarkeit und Vermarktbarkeit von visuellen Inhalten – wie z. B. Fotografien, Grafiken und Videos – hängt grösstenteils von der Verfügbarkeit hochwertiger Metadaten ab, die es den Kundinnen und Kundenermöglichen, effizient relevante Inhalte in umfangreichen Kollektionen zu lokalisieren. Die entsprechende Verschlagwortung erfolgt in der Praxis oft manuell und ist daher kostenintensiv und nur für einen geringen Anteil der verfügbaren Inhalte wirtschaftlich durchführbar. Um die Wahrscheinlichkeit, dass Bilder bei Suchvorgängen angezeigt (und somit gekauft) werden, zu erhöhen, wäre eine Qualitätskontrolle – vor allem bei visuellen Inhalten von Drittanbietern – von hoher Bedeutung, da diese Inhalte übermässig oft nicht-relevante Schlagworte enthalten.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2015): Potential, Methods and Limitations of Text and Data Mining. Text and Data Mining: Discovery of Knowledge in the Digital Age. Consortium of Swiss Academic Libraries. Bern, 9. Juni, 2015. Online verfügbar unter https://consortium.ch/veranstaltungen/, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Hubmann-Haidvogel, Alexander; Rafelsberger, Walter; Weichselbraun, Albert; Lang, Heinz-Peter; Sabou, Marta (2014) : Visualizing Knowledge Along Semantic and Geographic Dimensions. A Web Intelligence Platform to Explore Climate Change Coverage In: Okada, Alexandra; Buckingham Shum, Simon J.; Sherborne, Tony (Hg.): Knowledge Cartography: London: Springer (Advanced Information and Knowledge Processing), S. 423-441

    DOI: https://doi.org/10.1007/978-1-4471-6470-8_19 

    Abstract: This chapter presents the Media Watch on Climate Change, a publicly available Web intelligence portal that collects, aggregates and visualizes large archives of digital content from multiple stakeholder groups (documents and user comments from news media, blogs, user-generated content from Facebook, Twitter and YouTube, corporate and NGO Web sites, and a range of other sources). An visual dashboard with trend charts and complex map projections not only shows how often and where environmental information is published, but also provides a real-time account of concepts that stakeholders associate with climate change. Positive or negative sentiment is computed automatically, which sheds light on the impact of education and public outreach campaigns that target environmental literacy, and helps to gain a better understanding of how others perceive climate-related issues.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2014): COMET: Cross-media extraction of unified high-quality marketing data (Einblicke in die Forschung). Online verfügbar unter https://www.fhgr.ch/fileadmin/publikationen/forschungsbericht/fhgr-Einblicke_in_die_Forschung_2014.pdf, zuletzt geprüft am 09.04.2021

     

    Abstract: In vielen Medienkanälen, wie zum Beispiel Print- und Onlinemedien, Blogs, und Social Media sind marktrelevante Daten zu finden, welche die öffentliche Wahrnehmung von Produkten, deren Stärken und Schwächen sowie den Erfolg von Public Relations und Marketing Strategien widerspiegeln. Eine manuelle Auswertung dieser Datensätze ist aufgrund der steigenden Anzahl von möglichen Inhaltsquellen oft nicht möglich, so-dass in der Praxis auf Business und Web Intelligence Technologien zurückgegriffen wird, welche automatisch entscheidungsrelevante Informationen aus diesen Quellen extrahieren.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Gindl, Stefan; Scharl, Arno (2014): Enriching semantic knowledge bases for opinion mining in big data applications. In: Knowledge-based systems 69, S. 78-85. Online verfügbar unter doi.org/10.1016/j.knosys.2014.04.039, zuletzt geprüft am 18.05.2021

     

    Abstract: This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications. The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet. It comprises the following steps: (i) identify ambiguous sentiment terms, (ii) provide context information extracted from a domain-specific training corpus, and (iii) ground this contextual information to structured background knowledge sources such as ConceptNet and WordNet. A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification. Crowdsourced gold standard data in conjunction with a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Streiff, Daniel; Scharl, Arno (2014) : Linked Enterprise Data for Fine Grained Named Entity Linking and Web Intelligence In: Akerkar, Rajendra; Bassiliades, Nick; Davies, John; Ermolayev, Vadim (Hg.): Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics, Article No.: 13: WIMS '14: Thessaloniki, Greece, 2.-4. Juni. Aristotle University of Thessaloniki: New York, NY, USA: Association for Computing Machinery (ACM). Online verfügbar unter https://doi.org/10.1145/2611040.2611052, zuletzt geprüft am 21.05.2021

     

    Abstract: To identify trends and assign metadata elements such as location and sentiment to the correct entities, Web intelligence applications require methods for linking named entities and revealing relations between organizations, persons and products. For this purpose we introduce Recognyze, a named entity linking component that uses background knowledge obtained from linked data repositories. This paper outlines the underlying methods, provides insights into the migration of proprietary knowledge sources to linked enterprise data, and discusses the lessons learned from adapting linked data for named entity linking. A large dataset obtained from Orell Füssli, the largest Swiss business information provider, serves as the main showcase. This dataset includes more than nine million triples on companies, their contact information, management, products and brands. We identify major challenges towards applying this data for named entity linking and conduct a comprehensive evaluation based on several news corpora to illustrate how Recognyze helps address them, and how it improves the performance of named entity linking components drawing upon linked data rather than machine learning techniques.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2014): Linked Enterprise Data for Fine Grained Named Entity Linking and Web Intelligence. 4th International Conference on Web Intelligence, Mining and Semantics (WIMS 2014). Thessaloniki, Greece, 2. Juni, 2014. Online verfügbar unter http://wims14.csd.auth.gr/?page_id=1073, zuletzt geprüft am 28.05.2021

     

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Gindl, Stefan; Weichselbraun, Albert; Scharl, Arno (2013) : Rule-based opinion target and aspect extraction to acquire affective knowledge In: Schwabe, Daniel; Almeida, Virgílio; Glaser, Hartmut; Baeza-Yates, Ricardo; Moon, Sue (Hg.): WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web: WWW'13: Rio de Janeiro, Brazil, 13.-17. Mai: New York, NY, USA: Association for Computing Machinery (ACM), S. 557-563. Online verfügbar unter https://doi.org/10.1145/2487788.2487994, zuletzt geprüft am 21.05.2021

     

    Abstract: Opinion holder and opinion target extraction are among the most popular and challenging problems tackled by opinion mining researchers, recognizing the significant business value of such components and their importance for applications such as media monitoring and Web intelligence. This paper describes an approach that combines opinion target extraction with aspect extraction using syntactic patterns. It expands previous work limited by sentence boundaries and includes a heuristic for anaphora resolution to identify targets across sentences. Furthermore, it demonstrates the application of concepts known from research on open information extraction to the identification of relevant opinion aspects. Qualitative analyses performed on a corpus of 100,000 Amazon product reviews show that the approach is promising. The extracted opinion targets and aspects are useful for enriching common knowledge resources and opinion mining ontologies, and support practitioners and researchers to identify opinions in document collections.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Hubmann-Haidvogel, Alexander; Weichselbraun, Albert; Lang, Heinz-Peter; Sabou, Marta (2013) : Media Watch on Climate Change. Visual Analytics for Aggregating and Managing Environmental Knowledge from Online Sources In: Sprague, Ralph H.: 46th Annual Hawaii International Conference on System Sciences: Proceedings: HICSS 46: Grand Wailea, HI, 7.-10. Januar. IEEE Computer Society: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 955-964. Online verfügbar unter doi.org/10.1109/HICSS.2013.398, zuletzt geprüft am 27.11.2020

     

    Abstract: This paper presents the Media Watch on Climate Change, a public Web portal that captures and aggregates large archives of digital content from multiple stakeholder groups. Each week it assesses the domain-specific relevance of millions of documents and user comments from news media, blogs, Web 2.0 platforms such as Facebook, Twitter and YouTube, the Web sites of companies and NGOs, and a range of other sources. An interactive dashboard with trend charts and complex map projections not only shows how often and where environmental information is published, but also provides a real-time account of concepts that stakeholders associate with climate change. Positive or negative sentiment is computed automatically, which not only sheds light on the impact of education and public outreach campaigns that target environmental literacy, but also help to gain a better understanding of how others perceive climate-related issues.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Hubmann-Haidvogel, Alexander; Sabou, Marta; Weichselbraun, Albert; Lang, Heinz-Peter (2013): From Web Intelligence to Knowledge Co-Creation. A Platform for Analyzing and Supporting Stakeholder Communication. In: IEEE Internet Computing 17, S. 21-29. Online verfügbar unter doi.org/10.1109/MIC.2013.59, zuletzt geprüft am 18.05.2021

     

    Abstract: Organizations require tools that can assess their online reputations as well as the impact of their marketing and public outreach activities. The Media Watch on Climate Change is a Web intelligence and online collaboration platform that addresses this requirement. It aggregates large archives of digital content from multiple stakeholder groups and enables the co-creation and visualization of evolving knowledge archives. Here, the authors introduce the base platform and a context-aware document editor as an add-on that supports concurrent authoring by multiple users. While documents are being edited, semantic methods analyze them on the fly to recommend related content. The system computes positive or negative sentiment automatically to provide a better understanding of third-party perceptions. The editor is part of an interactive dashboard that uses trend charts and map projections to show how often and where relevant information is published, and to provide a real-time account of concepts that stakeholders associate with a topic.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Gindl, Stefan; Scharl, Arno (2013): Extracting and Grounding Contextualized Sentiment Lexicons. In: IEEE Systems Journal 28, S. 39-46. Online verfügbar unter doi.org/10.1109/MIS.2013.41, zuletzt geprüft am 27.11.2020

     

    Abstract: A context-aware approach based on machine learning and lexical analysis identifies ambiguous terms and stores them in contextualized sentiment lexicons, which ground the terms to concepts corresponding to their polarity.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Scharl, Arno; Lang, Heinz-Peter (2013) : Knowledge capture from multiple online sources with the extensible web retrieval toolkit (eWRT) In: Benjamins, Richard; d'Aquin, Mathieu; Gordon, Andrew (Hg.): Proceedings of the seventh international conference on Knowledge capture: "Knowledge Capture in the Age of Massive Web Data": K-CAP 2013: Banff, Canada, 23.-26. Juni: New York, NY, USA: Association for Computing Machinery (ACM), S. 129-132. Online verfügbar unter https://doi.org/10.1145/2479832.2479861, zuletzt geprüft am 21.05.2021

     

    Abstract: Knowledge capture approaches in the age of massive Web data require robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data, both unstructured and structured. This paper addresses this requirement by introducing the Extensible Web Retrieval Toolkit (eWRT), a modular Python API for retrieving social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia. eWRT has been released as an open source library under GNU GPLv3. It includes classes for caching and data management, and provides low-level text processing capabilities including language detection, phonetic string similarity measures, and string normalization.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Lang, Heinz-Peter; Wohlgenannt, Gerhard; Weichselbraun, Albert (2012) : TextSweeper. A System for Content Extraction and Overview Page Detection: CONF-IRM 2012 Proceedings: International Conference on Information Resources Management: Wien, 21.-23. Mai

    Abstract: Web pages not only contain main content, but also other elements such as navigation panels, advertisements and links to related documents. Furthermore, overview pages (summarization pages and entry points) duplicate and aggregate parts of articles and thereby create redundancies. The noise elements in Web pages as well as overview pages affect the performance of downstream processes such as Web-based Information Retrieval. Context Extraction’s task is identifying and extracting the main content from a Web page. In this research-in-progress paper we present an approach which not only identifies and extracts the main content, but also detects overview pages and thereby allows skipping them. The content extraction part of the system is an extension of existing Text-to-Tag ratio methods, overview page detection is accomplished with the net text length heuristic. Preliminary results and ad-hoc evaluation indicate a promising system performance. A formal evaluation and comparison to other state-of-the-art approaches is part of future work.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Sabou, Marta; Gindl, Stefan; Rafelsberger, Walter; Weichselbraun, Albert (2012) : Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources In: Calzolari, Nicoletta; Choukri, Khalid; Declerck, Thierry; Doğan, Mehmet Uğur; Maegaard, Bente; Mariani, Joseph; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios (Hg.): Proceedings of the 8th International Conference on Language Resources and Evaluation: International Conference on Language Resources and Evaluation; LREC 2010: Istanbul, 23.-25. Mai: Paris: European Language Resources Association (ELRA). Online verfügbar unter http://www.lrec-conf.org/proceedings/lrec2012/index.html, zuletzt geprüft am 04.12.2020

     

    Abstract: Games with a purpose are an increasingly popular mechanism for leveraging the wisdom of the crowds to address tasks which are trivial for humans but still not solvable by computer algorithms in a satisfying manner. As a novel mechanism for structuring human-computer interactions, a key challenge when creating them is motivating users to participate while generating useful and unbiased results. This paper focuses on important design choices and success factors of effective games with a purpose. Our findings are based on lessons learned while developing and deploying Sentiment Quiz, a crowdsourcing application for creating sentiment lexicons (an essential component of most sentiment detection algorithms). We describe the goals and structure of the game, the underlying application framework, the sentiment lexicons gathered through crowdsourcing, as well as a novel approach to automatically extend the lexicons by means of a bootstrapping process. Such an automated extension further increases the efficiency of the acquisition process by limiting the number of terms that need to be gathered from the game participants.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Hubmann-Haidvogel, Alexander; Weichselbraun, Albert; Wohlgenannt, Gerhard; Lang, Heinz-Peter; Sabou, Marta (2012) : Extraction and interactive exploration of knowledge from aggregated news and social media content In: Barbosa, Simone D.J.; Campos, José Creissac; Kazman, Rick; Palanque, Philippe; Harrison, Michael; Reeves, Steve (Hg.): Proceedings of the 2012 ACM SIGCHI Symposium on Engineering Interactive Computing Systems: EICS '12: Kopenhagen, 25.-26. Juni: New York, NY, USA: Association for Computing Machinery (ACM), S. 163-168. Online verfügbar unter https://doi.org/10.1145/2305484.2305511, zuletzt geprüft am 21.05.2021

     

    Abstract: The webLyzard media monitoring and Web intelligence platform (www.webLyzard.com) presented in this paper is a generic tool for assessing the strategic positioning of an organization and the effectiveness of its communication strategies. The platform captures and aggregates large archives of digital content from multiple stakeholder groups. Each week it processes millions of documents and user comments from news media, blogs, Web 2.0 platforms such as Facebook, Twitter and YouTube, the Web sites of companies and NGOs, and other sources. An interactive dashboard with trend charts and complex map projections shows how often and where information is published. It also provides a real-time account of topics that stakeholders associate with an organization. Positive or negative sentiment is computed automatically, which reflects the impact of public relations and marketing campaigns.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Syed, Kamran Ali Ahmad; Kröll, Mark; Sabol, Vedran; Scharl, Arno; Gindl, Stefan; Granitzer, Michael; Weichselbraun, Albert (2012) : Dynamic Topography Information Landscapes. An Incremental Approach to Visual Knowledge Discovery In: Cuzzocrea, Alfredo; Dayal, Umeshwar (Hg.): Data Warehousing and Knowledge Discovery: 14th International Conference: DaWaK 2012: Wien, 3.-6. Sepember: Berlin, Heidelberg: Springer (Lecture Notes in Computer Science), S. 352-363

    DOI: https://doi.org/10.1007/978-3-642-32584-7_29 

    Abstract: Incrementally computed information landscapes are an effective means to visualize longitudinal changes in large document repositories. Resembling tectonic processes in the natural world, dynamic rendering reflects both long-term trends and short-term fluctuations in such repositories. To visualize the rise and decay of topics, the mapping algorithm elevates and lowers related sets of concentric contour lines. Addressing the growing number of documents to be processed by state-of-the-art knowledge discovery applications, we introduce an incremental, scalable approach for generating such landscapes. The processing pipeline includes a number of sequential tasks, from crawling, filtering and pre-processing Web content to projecting, labeling and rendering the aggregated information. Incremental processing steps are localized in the projection stage consisting of document clustering, cluster force-directed placement and fast document positioning. We evaluate the proposed framework by contrasting layout qualities of incremental versus non-incremental versions. Documents for the experiments stem from the blog sample of the Media Watch on Climate Change (www.ecoresearch.net/climate). Experimental results indicate that our incremental computation approach is capable of accurately generating dynamic information landscapes.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2012): Onlineaktivitäten optimieren. In: Wissensplatz, S. 6-7. Online verfügbar unter https://www.fhgr.ch/fhgr/medien-und-oeffentlichkeit/publikationen/wissensplatz/februar-2012/, zuletzt geprüft am 31.01.2019

     

    Abstract: Soziale Netzwerke wie Facebook oder Xing, Onlinebuchungs- und Verkaufsplattformen sowie die auf ihnen veröffentlichten Rezensionen gewinnen immer stärker an Bedeutung. Die HTW Chur bietet mit ihrem neuen Schwerpunkt Web Monitoring und Web Intelligence jetzt erstmals für Bündner Unternehmen die Möglichkeit an, ihre Onlinestrategie zu optimieren und webbasierte Datenquellen für strategische Entscheidungen und ihre Produktentwicklung zu nutzen.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2012): Coping with Evolving Knowledge. Dynamic Domain Ontologies for Web Intelligence. Invited Speech. 11th International Workshop on Web Semantics and Information (WebS 2012) in conjunction with the 23rd International Conference on Database and Expert Systems Applications (DEXA 2012). Vienna University of Technology. Wien, 5. September, 2012

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Wohlgenannt, Gerhard; Weichselbraun, Albert; Scharl, Arno; Sabou, Marta (2012): Dynamic Integration of Multiple Evidence Sources for Ontology Learning. In: Journal of Information and Data Management 3, S. 243-254. Online verfügbar unter https://periodicos.ufmg.br/index.php/jidm/article/view/166, zuletzt geprüft am 04.12.2020

     

    Abstract: Although ontologies are central to the Semantic Web, current ontology learning methods primarily make use of a single evidence source and are agnostic in their internal representations to the evolution of ontology knowledge. This article presents a continuous ontology learning framework that overcomes these shortcomings by integrating evidence from multiple, heterogeneous sources (unstructured, structured, social) in a consistent model, and by providing mechanisms for the fine-grained tracing of the evolution of domain ontologies. The presented framework supports a tight integration of human and machine computation. Crowdsourcing in the tradition of games with a purpose performs the evaluation of the learned ontologies and facilitates the automatic optimization of learning algorithms.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Wohlgenannt, Gerhard; Weichselbraun, Albert; Scharl, Arno; Sabou, Marta (2012) : Confidence Management for Learning Ontologies from Dynamic Web Sources In: Filipe, Joaquim; Dietz, Jan (Hg.): 4th International Conference on Knowledge Engineering and Ontology Development: Proceedings: KEOD 2012: Barcelona, Oktober: Setúbal: SciTePress, S. 172-177

    Abstract: Dynamic environments require effective update mechanisms for ontologies to incorporate new knowledge. In this position paper we present a dynamic framework for ontology learning which integrates automated learning methods with rapid user feedback mechanism to build and extend lightweight domain ontologies at regular intervals. Automated methods collect evidence from a variety of heterogeneous sources and generate an ontology with spreading activation techniques, while crowdsourcing in the form of Games with a Purpose validates the new ontology elements. Special data structures support dynamic confidence management in regards to three major aspects of the ontology: (i) the incoming facts collected from evidence sources, (ii) the relations that constitute the extended ontology, and (iii) the observed quality of evidence sources. Based on these data structures we propose trend detection experiments to measure not only significant changes in the domain, but also in the conceptualization suggested by user feedback.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Weichselbraun, Albert (2011): Context-Aware Sentiment Detection. Curtin University. Digital Ecosystems and Business Intelligence Institute. Perth, Australia, 10. Februar, 2011

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Hubmann-Haidvogel Alexander; Wohlgenannt, Gerhard; Weichselbraun, Albert; Dickinger, Astrid (2011) : Scalable annotation mechanisms for digital content aggregation and context-aware authoring In: Sieckenius de Souza, Clarisse; Sánchez, J. Alfredo; Gomes, Alex Sandro (Hg.): Proceedings of the 10th Brazilian Symposium on on Human Factors in Computing Systems and the 5th Latin American Conference on Human-Computer Interaction: IHC+CLIHC 2011: Porto de Galinhas, 25.-28. Oktober. ACM Special Interest Group on Computer-Human Interaction: Porto Alegre, Brazil: Brazilian Computer Society, S. 376-380. Online verfügbar unter https://dl.acm.org/doi/10.5555/2254436.2254498, zuletzt geprüft am 04.12.2020

     

    Abstract: This paper discusses the role of context information in building the next generation of human-centered information systems, and classifies the various aspects of contextualization with a special emphasis on the production and consumption of digital content. The real-time annotation of resources is a crucial element when moving from content aggregators (which process third-party digital content) to context-aware visual authoring environments (which allow users to create and edit their own documents). We present a publicly available prototype of such an environment, which required a major redesign of an existing Web intelligence and media monitoring framework to provide real-time data services and synchronize the text editor with the frontend's visual components. The paper concludes with a summary of achieved results and an outlook on possible future research avenues including multi-user support and the visualization of document evolution.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2011): Ontology Learning based on Text Mining and Social Evidence Sources. University of Western Australia. School of Computer Science and Software Engineering. Perth, Australia, 9. Februar, 2011

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2011): Optimizing queries to remote resources. In: Journal of Intelligent Information Systems 37, S. 119-137. Online verfügbar unter doi.org/10.1007/s10844-010-0129-0, zuletzt geprüft am 04.12.2020

     

    Abstract: One key property of the Semantic Web is its support for interoperability. Recent research in this area focuses on the integration of multiple data sources to facilitate tasks such as ontology learning, user query expansion and context recognition. The growing popularity of such machups and the rising number of Web APIs supporting links between heterogeneous data providers asks for intelligent methods to spare remote resources and minimize delays imposed by queries to external data sources. This paper suggests a cost and utility model for optimizing such queries by leveraging optimal stopping theory from business economics: applications are modeled as decision makers that look for optimal answer sets. Queries to remote resources cause additional cost but retrieve valuable information which improves the estimation of the answer set’s utility. Optimal stopping optimizes the trade-off between query cost and answer utility yielding optimal query strategies for remote resources. These strategies are compared to conventional approaches in an extensive evaluation based on real world response times taken from seven popular Web services.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Gindl, Stefan; Scharl, Arno (2011) : Using games with a purpose and bootstrapping to create domain-specific sentiment lexicons In: Berendt, Bettina; de Vries, Arjen; Fan, Wenfei; Macdonald, Craig; Ounis, Iadh; Ruthven, Ian (Hg.): Proceedings of the 20th International Conference on Information & Knowledge Management and co-located workshops: International Conference on Information and Knowledge Management, CIKM 2011: Glasgow, 24.-28. Oktober. Association for Computing Machinery: New York, NY: Association for Computing Machinery (ACM), S. 1053-1060. Online verfügbar unter https://doi.org/10.1145/2063576.2063729, zuletzt geprüft am 04.12.2020

     

    Abstract: Sentiment detection analyzes the positive or negative polarity of text. The field has received considerable attention in recent years, since it plays an important role in providing means to assess user opinions regarding an organization's products, services, or actions. Approaches towards sentiment detection include machine learning techniques as well as computationally less expensive methods. Both approaches rely on the use of language-specific sentiment lexicons, which are lists of sentiment terms with their corresponding sentiment value. The effort involved in creating, customizing, and extending sentiment lexicons is considerable, particularly if less common languages and domains are targeted without access to appropriate language resources. This paper proposes a semi-automatic approach for the creation of sentiment lexicons which assigns sentiment values to sentiment terms via crowd-sourcing. Furthermore, it introduces a bootstrapping process operating on unlabeled domain documents to extend the created lexicons, and to customize them according to the particular use case. This process considers sentiment terms as well as sentiment indicators occurring in the discourse surrounding a articular topic. Such indicators are associated with a positive or negative context in a particular domain, but might have a neutral connotation in other domains. A formal evaluation shows that bootstrapping considerably improves the method's recall. Automatically created lexicons yield a performance comparable to professionally created language resources such as the General Inquirer.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Wohlgenannt, Gerhard; Scharl, Arno (2011) : Applying Optimal Stopping Theory to Improve the Performance of Ontology Refinement Methods In: Sprague, Ralph H.: Proceedings of the 44th Annual Hawaii International Conference on System Sciences: HICSS '11: Koloa, Kauai, Hawaii, 4.-7. Januar. Institute of Electrical and Electronic Engineers: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE). Online verfügbar unter https://doi.org/10.1109/HICSS.2011.72, zuletzt geprüft am 04.12.2020

     

    Abstract: Recent research shows the potential of utilizing data collected through Web 2.0 applications to capture domain evolution. Relying on external data sources, however, often introduces delays due to the time spent retrieving data from these sources. The method introduced in this paper streamlines the data acquisition process by applying optimal stopping theory. An extensive evaluation demonstrates how such an optimization improves the processing speed of an ontology refinement component which uses Delicious to refine ontologies constructed from unstructured textual data while having no significant impact on the quality of the refinement process. Domain experts compare the results retrieved from optimal stopping with data obtained from standardized techniques to assess the effect of optimal stopping on data quality and the created domain ontology.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Wohlgenannt, Gerhard; Scharl, Arno (2011) : Evidence Sources, Methods and Use Cases for Learning Lightweight Domain Ontologies In: Wong, Wilson; Liu, Wei; Bennamoun, Mohammed (Hg.): Ontology Learning and Knowledge Discovery Using the Web: Hershey, PA: IGI Global, S. 1-15

    DOI: https://doi.org/10.4018/978-1-60960-625-1.ch001 

    Abstract: By providing interoperability and shared meaning across actors and domains, lightweight domain ontologies are a cornerstone technology of the Semantic Web. This chapter investigates evidence sources for ontology learning and describes a generic and extensible approach to ontology learning that combines such evidence sources to extract domain concepts, identify relations between the ontology’s concepts, and detect relation labels automatically. An implementation illustrates the presented ontology learning and relation labeling framework and serves as the basis for discussing possible pitfalls in ontology learning. Afterwards, three use cases demonstrate the usefulness of the presented framework and its application to real-world problems.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Gindl, Stefan; Scharl, Arno; Weichselbraun, Albert (2010) : Generic high-throughput methods for multilingual sentiment detection: 4th IEEE International Conference on Digital Ecosystems and Technologies: Proceedings: IEEE-DEST: Dubai, 13. - 16. April. Institute of Electrical and Electronic Engineers: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 239-244. Online verfügbar unter doi.org/10.1109/DEST.2010.5610641, zuletzt geprüft am 11.12.2020

     

    Abstract: Digital ecosystems typically involve a large number of participants from different sectors who generate rapidly growing archives of unstructured text. Measuring the frequency of certain terms to determine the popularity of a topic is comparably straightforward. Detecting sentiment expressed in user-generated electronic content is more challenging, especially in the case of digital ecosystems comprising heterogeneous sets of multilingual documents. This paper describes the use of language-specific grammar patterns and multilingual tagged dictionaries to detect sentiment in German and English document repositories. Digital ecosystems may contain millions of frequently updated documents, requiring sentiment detection methods that maximize throughput. The ideal combination of high-throughput techniques and more accurate (but slower) approaches depends on the specific requirements of an application. To accommodate a wide range of possible applications, this paper presents (i) an adaptive method, balancing accuracy and scalability of multilingual textual sources, (ii) a generic approach for generating language- specific grammar patterns and multilingual tagged dictionaries, and (iii) an extensive evaluation verifying the method's performance based on Amazon product reviews and user evaluations from Sentiment Quiz, a “game with a purpose” that invites users of the Facebook social networking platform to assess the sentiment of individual sentences.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Gindl, Stefan; Weichselbraun, Albert; Scharl, Arno (2010) : Cross-Domain Contextualization of Sentiment Lexicons In: Coelho, Helder; Studer, Rudi; Wooldridge, Michael (Hg.): Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence: Amsterdam: IOS Press, S. 771-776

    Abstract: The simplicity of using Web 2.0 platforms and services has resulted in an abundance of user-generated content. A significant part of this content contains user opinions with clear economic relevance-customer and travel reviews, for example, or the articles of well-known and respected bloggers who influence purchase decisions. Analyzing and acting upon user-generated content is becoming imperative for marketers and social scientists who aim to gather feedback from very large user communities. Sentiment detection, as part of opinion mining, supports these efforts by identifying and aggregating polar opinions-i.e., positive or negative statements about facts. For achieving accurate results, sentiment detection requires a correct interpretation of language, which remains a challenging task due to the inherent ambiguities of human languages. Particular attention has to be directed to the context of opinionated terms when trying to resolve these ambiguities. Contextualized sentiment lexicons address this need by considering the sentiment term's context in their evaluation but are usually limited to one domain, as many contextualizations are not stable across domains. This paper introduces a method which identifies unstable contextualizations and refines the contextualized sentiment dictionaries accordingly, eliminating the need for specific training data for each individual domain. An extensive evaluation compares the accuracy of this approach with results obtained from domain-specific corpora.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Weichselbraun, Albert (2010) : Building a Web-Based Knowledge Repository on Climate Change to Support Environmental Communities In: Lytras, Miltiadis D.; Ordonez de Pablos, Patricia; Ziderman, Adrian; Roulstone, Alan; Maurer, Hermann; Imber, Jonathan B. (Hg.): Organizational, Business, and Technological Aspects of the Knowledge Society: Proceedings, Part II: Third World Summit on the Knowledge Society, WSKS 2010: Corfu, 22.-24. September: Berlin, Heidelberg: Springer (Communications in Computer and Information Science), S. 79-84

    DOI: https://doi.org/10.1007/978-3-642-16324-1_9 

    Abstract: This paper presents the technology base and roadmap of the Climate Change Collaboratory, a Web-based platform that aims to strengthen the relations between scientists, educators, environmental NGOs, policy makers, news media and corporations - stakeholders who recognize the need for adaptation and mitigation, but differ in world-views, goals and agendas. The collaboratory manages expert knowledge and provides a platform for effective communication and collaboration. It aims to assist networking with leading international organizations, bridges the science-policy gap and promotes rich, self-sustaining community interaction to translate knowledge into coordinated action. Innovative survey instruments in the tradition of ”games with a purpose” will create shared meaning through collaborative ontology building and leverage social networking platforms to capture indicators of environmental attitudes, lifestyles and behaviors.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Gindl, Stefan; Scharl, Arno (2010): A Context-Dependent Supervised Learning Approach to Sentiment Detection in Large Textual Databases. In: Journal of Information and Data Management 1, S. 329-342. Online verfügbar unter https://periodicos.ufmg.br/index.php/jidm/article/view/54, zuletzt geprüft am 11.12.2020

     

    Abstract: Sentiment detection automatically identifies emotions in textual data. The increasing amount of emotive documents available in corporate databases and on the World Wide Web calls for automated methods to process this important source of knowledge. Sentiment detection draws attention from researchers and practitioners alike - to enrich business intelligence applications, for example, or to measure the impact of customer reviews on purchasing decisions. Most sentiment detection approaches do not consider language ambiguity, despite the fact that one and the same sentiment term might differ in polarity depending on the context, in which a statement is made. To address this shortcoming, this paper introduces a novel method that uses Naive Bayes to identify ambiguous terms. A contextualized sentiment lexicon stores the polarity of these terms, together with a set of co-occurring context terms. A formal evaluation of the assigned polarities confirms that considering the usage context of ambiguous terms improves the accuracy of high-throughput sentiment detection methods. Such methods are a prerequisite for using sentiment as a metadata element in storage and distributed ?file-level intelligence applications, as well as in enterprise portals that provide a semantic repository of an organization's information assets.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Wohlgenannt, Gerhard; Scharl, Arno (2010): Refining non-taxonomic relation labels with external structured data to support ontology learning. In: Data & Knowledge Engineering 69, S. 763-778. Online verfügbar unter https://doi.org/10.1016/j.datak.2010.02.010, zuletzt geprüft am 11.12.2020

     

    Abstract: This paper presents a method to integrate external knowledge sources such as DBpedia and OpenCyc into an ontology learning system that automatically suggests labels for unknown relations in domain ontologies based on large corpora of unstructured text. The method extracts and aggregates verb vectors from semantic relations identified in the corpus. It composes a knowledge base which consists of (i) verb centroids for known relations between domain concepts, (ii) mappings between concept pairs and the types of known relations, and (iii) ontological knowledge retrieved from external sources. Applying semantic inference and validation to this knowledge base improves the quality of suggested relation labels. A formal evaluation compares the accuracy and average ranking precision of this hybrid method with the performance of methods that solely rely on corpus data and those that are only based on reasoning and external data sources.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Wohlgenannt, Gerhard; Scharl, Arno (2010) : Augmenting Lightweight Domain Ontologies with Social Evidence Sources In: Tjoa, A. M.; Wagner, R. R. (Hg.): 21st International Conference on Database and Expert Systems Applications: Proceedings: DEXA: Bilbao, 30. August - 3. September: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 193-197. Online verfügbar unter https://doi.org/10.1109/DEXA.2010.53, zuletzt geprüft am 12.11.2020

     

    Abstract: Recent research shows the potential of utilizing data collected through Web 2.0 applications to capture changes in a domain's terminology. This paper presents an approach to augment corpus-based ontology learning by considering terms from collaborative tagging systems, social networking platforms, and micro-blogging services. The proposed framework collects information on the domain's terminology from domain documents and a seed ontology in a triple store. Data from social sources such as Delicious, Flickr, Technorati and Twitter provide an outside view of the domain and help incorporate external knowledge into the ontology learning process. The neural network technique of spreading activation is used to identify relevant new concepts, and to determine their positions in the extended ontology. Evaluating the method with two measures (PMI and expert judgments) demonstrates the significant benefits of social evidence sources for ontology learning.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Wohlgenannt, Gerhard; Scharl, Arno (2010): Augmenting Lightweight Domain Ontologies with Social Evidence Sources. 9th International Workshop on Web Semantics; 21th International Conference on Database and Expert Systems Application (DEXA 2010). University of Deusto. Bilbao, Spain, 31. August, 2010

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Gindl, Stefan; Liegl, Johannes; Scharl, Arno; Weichselbraun, Albert (2009) : An Evaluation Framework and Adaptive Architecture for Automated Sentiment Detection In: Pellegrini, Tassilo; Auer, Sóren; Tochtermann, Klaus; Schaffert, Sebastian (Hg.): Networked Knowledge, Networked Media: Integrating Knowledge Management, New Media Technologies and Semantic: Berlin, Heidelberg: Springer (Studies in Computational Intelligence), S. 217-234

    DOI: https://doi.org/10.1007/978-3-642-02184-8_15 

    Abstract: Analysts are often interested in how sentiment towards an organization, a product or a particular technology changes over time. Popular methods that process unstructured textual material to automatically detect sentiment based on tagged dictionaries are not capable of fulfilling this task, even when coupled with part-of speech tagging, a standard component of most text processing toolkits that distinguishes grammatical categories such as article, noun, verb, and adverb. Small corpus size, ambiguity and subtle incremental change of tonal expressions between different versions of a document complicate sentiment detection. Parsing grammatical structures, by contrast, outperforms dictionary-based approaches in terms of reliability, but usually suffers from poor scalability due to its computational complexity. This work provides an over view of different dictionary- and machine-learning-based sentiment detection methods and evaluates them on several Web corpora.After identifying the shortcomings of these methods, the paper proposes an approach based on automatically building Tagged Linguistic Unit (TLU) databases to overcome the restrictions of dictionaries with a limited set of tagged tokens.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hubmann-Haidvogel, Alexander; Scharl, Arno; Weichselbraun, Albert (2009): Multiple coordinated views for searching and navigating Web content repositories. In: Information Sciences 179, S. 1813-1821. Online verfügbar unter https://doi.org/10.1016/j.ins.2009.01.030, zuletzt geprüft am 15.01.2021

     

    Abstract: The advantages and positive effects of multiple coordinated views on search performance have been documented in several studies. This paper describes the implementation of multiple coordinated views within the Media Watch on Climate Change, a domain-specific news aggregation portal available at www.ecoresearch.net/climate that combines a portfolio of semantic services with a visual information exploration and retrieval interface. The system builds contextualized information spaces by enriching the content repository with geospatial, semantic and temporal annotations, and by applying semi-automated ontology learning to create a controlled vocabulary for structuring the stored information. Portlets visualize the different dimensions of the contextualized information spaces, providing the user with multiple views on the latest news media coverage. Context information facilitates access to complex datasets and helps users navigate large repositories of Web documents. Currently, the system synchronizes information landscapes, domain ontologies, geographic maps, tag clouds and just-in-time information retrieval agents that suggest similar topics and nearby locations.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Juffinger, Andreas; Neidhart, Thomas; Granitzer, Michael; Kern, Roman; Weichselbraun, Albert; Wohlgenannt, Gerhard; Scharl, Arno (2009): Distributed Web 2.0. Crawling for Ontology Evolution. In: Journal of Digital Information Management 7, S. 114-119

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Pollach, Irene; Scharl, Arno; Weichselbraun, Albert (2009): Web content mining for comparing corporate and third-party online reporting. A case study on solid waste management. In: Business Strategy and the Environment 18, S. 137-148. Online verfügbar unter doi.org/10.1002/bse.549, zuletzt geprüft am 15.01.2021

     

    Abstract: This study investigates the coverage of solid waste management on 1142 websites maintained by companies, news media and non‐governmental organizations to validate an automated approach to content and language analysis. First, a frequency analysis of waste management terms sheds light on the breadth and depth of their environmental discourses, revealing that corporate and media attention to waste management is small compared with that of non‐governmental organizations. Second, an investigation of their attitudes toward waste management suggests that companies avoid negative information in environmental communication, unlike news media or non‐governmental organizations. Ultimately, an automated tool for ontology building is employed to gain insights into companies' shared understanding of waste management. The ontology obtained indicates that companies conceptualize waste management as a business process rather than framing it from an ecological perspective, which is in line with findings from previous research.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert; Wohlgenannt, Gerhard; Scharl, Arno; Granitzer, Michael; Neidhart, Thomas; Juffinger, Andreas (2009): Discovery and evaluation of non-taxonomic relations in domain ontologies. In: International Journal of Metadata, Semantics and Ontologies 4. DOI: 10.1504/IJMSO.2009.027755

    DOI: https://doi.org/10.1504/IJMSO.2009.027755 

    Abstract: The identification and labelling of non-hierarchical relations are among the most challenging tasks in ontology learning. This paper describes a bottom-up approach for automatically suggesting ontology link types. The presented method extracts verb-vectors from semantic relations identified in the domain corpus, aggregates them by computing centroids for known relation types, and stores the centroids in a central knowledge base. Comparing verb-vectors extracted from unknown relations with the stored centroids yields link type suggestions. Domain experts evaluate these suggestions, refining the knowledge base and constantly improving the components accuracy. A final evaluation provides a detailed statistical analysis of the introduced approach.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2009) : Applying Optimal Stopping for Optimizing Queries to External Semantic Web Resources In: Cordeiro, José; Shishkov Boris; Ranchordas AlpeshKumar; Helfert, Markus (Hg.): Software and data technologies: Third international conference, ICSOFT 2008, Porto, Portugal, July 22 - 24, 2008 ; revised selected papers, 47. ICSOFT 2008: Berlin: Springer (Communications in Computer and Information Science), S. 105-118

    DOI: https://doi.org/10.1007/978-3-642-05201-9_9 

    Abstract: The rapid increase in the amount of available information from various online sources poses new challenges for programs that endeavor to process these sources automatically and identify the most relevant material for a given application. This paper introduces an approach for optimizing queries to Semantic Web resources based on ideas originally proposed by MacQueen for optimal stopping in business economics. Modeling applications as decision makers looking for optimal action/answer sets, facing search costs for acquiring information, test costs for checking these information, and receiving a reward depending on the usefulness of the proposed solution, yields strategies for optimizing queries to external services. An extensive evaluation compares these strategies to a conventional coverage based approach, based on real world response times taken from popular Web services.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2009) : A Utility Centered Approach for Evaluating and Optimizing Geo-Tagging: International Conference on Knowledge Discovery and Information Retrieval: KDIR: Madeira, 6. - 8. Oktober, S. 134-139

    Abstract: Geo-tagging is the process of annotating a document with its geographic focus by extracting a unique locality that describes the geographic context of the document as a whole (Amitay et al., 2004). Accurate geographic annotations are crucial for geospatial applications such as Google Maps or the IDIOM Media Watch on Climate Change (Hubmann-Haidvogel et al., 2009), but many obstacles complicate the evaluation of such tags. This paper introduces an approach for optimizing geo-tagging by applying the concept of utility from economic theory to tagging results. Computing utility scores for geo-tags allows a fine grained evaluation of the tagger’s performance in regard to multiple dimensions specified in use case specific domain ontologies and provides means for addressing problems such as different scope and coverage of evaluation corpora. The integration of external data sources and evaluation ontologies with user profiles ensures that the framework considers use case specific requirements. The presented model is instrumental in comparing different geotagging settings, evaluating the effect of design decisions, and customizing geo-tagging to a particular use cases.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Weichselbraun, Albert (2009): A Utility Centered Approach for Evaluating and Optimizing Geo-Tagging. First International Conference on Knowledge Discovery and Information Retrieval (KDIR 2009). Madeira, Portugal, 7. Oktober, 2009

    Abstract: Geo-tagging is the process of annotating a document with its geographic focus by extracting a unique locality that describes the geographic context of the document as a whole. Accurate geographic annotations are crucial for geospatial applications such as Google Maps or the IDIOM Media Watch on Climate Change, but many obstacles complicate the evaluation of such tags. This paper introduces an approach for optimizing geo-tagging by applying the concept of utility from economic theory to tagging results. Computing utility scores for geo-tags allows a fine grained evaluation of the tagger's performance in regard to multiple dimensions specified in use case specific domain ontologies and provides means for addressing problems such as different scope and coverage of evaluation corpora. The integration of external data sources and evaluation ontologies with user profiles ensures that the framework considers use case specific requirements. The presented model is instrumental in comparing different geo-tagging settings, evaluating the effect of design decisions, and customizing geo-tagging to a particular use cases.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Wohlgenannt, Gerhard; Weichselbraun, Albert; Scharl, Arno (2009) : Integrating Structural Data into Methods for Labeling Relations in Domain Ontologies: 20th International Workshop on Database and Expert Systems Application: Proceedings: DEXA: Linz, 31. August - 4. September: Piscataway, NJ: Institute of Electrical and Electronic Engineers (IEEE), S. 94-98. Online verfügbar unter https://doi.org/10.1109/DEXA.2009.26, zuletzt geprüft am 15.01.2021

     

    Abstract: This paper presents a method for integrating DBpedia data into an ontology learning system that automatically suggests labels for relations in domain ontologies based on large corpora of unstructured text. The method extracts and aggregates verb vectors for semantic relations identified in the corpus. It composes a knowledge base which consists of (i) centroids for known relations between domain concepts, (ii) mappings between concept pairs and the types of known relations, and (iii) ontological knowledge retrieved from DBpedia. Refining similarities between the verb centroids of labeled and unlabeled relations by means of including domain and range constraints applying DBpedia data yields relation type suggestions. A formal evaluation compares the accuracy and average ranking performance of this hybrid method with the performance of methods that solely rely on corpus data and those that are only based on reasoning and external data sources.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Dickinger, Astrid; Scharl, Arno; Stern, Hermann; Weichselbraun, Albert; Wöber, Karl (2008) : Acquisition and Relevance of Geotagged Information in Tourism In: O’Connor, Peter; Höpken, Wolfram; Gretzel, Ulrike (Hg.): Information and Communication Technologies in Tourism 2008: Wien: Springer, S. 545-555

    DOI: https://doi.org/10.1007/978-3-211-77280-5_48 

    Abstract: In the case of tourism applications, it is particularly evident that geography is emerging as a fundamental principle for structuring Web resources. Recent improvements in semantic and geographic Web technology, often referred to as the Geospatial Web, acknowledge the relevance of adding location metadata to existing databases and accessing the vast amounts of information stored in these databases via geospatial services. This paper outlines the acquisition of geospatial context information, describes usage scenarios and real-world applications in the tourism industry, and presents an automated software tool for annotating large collections of Web documents automatically. The quality of this tool is tested based upon Web pages from the Austrian National Tourism Organization. Initial results are encouraging and help define a roadmap for further improving the automated tagging of tourism resources.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Hubmann-Haidvogel, Alexander; Scharl, Arno; Weichselbraun, Albert (2008) : Tightly coupled views for navigating content repositories In: Farias, Cléver Ricardo Guareis de; Almeida, João Paulo Andrade; Filho, José Gonçalves Pereira (Hg.): Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web: WebMedia '08: Vila Velha Espírito Santo Brazil, 26. - 29. Oktober: New York, NY: ACM Press, S. 5-8. Online verfügbar unter doi.org/10.1145/1809980.1809983, zuletzt geprüft am 15.01.2021

     

    Abstract: The advantages and positive effects of tightly coupled interface components on search performance have been documented in several studies. This paper focuses on the implementation of tightly coupled views within the Media Watch on Climate Change, an interactive Web portal (www.ecoresearch.net/climate) combining a portfolio of semantic services with a visual exploration and information retrieval interface. The portal builds contextualized information spaces by (a) enriching the content repository with spatial, semantic, and temporal annotations, and (b) applying semi-automated ontology learning to the repository yielding a controlled vocabulary that helps structuring the stored information. Different portlets visualize aspects of the contextualized information spaces, providing the user with multiple views on the available information. Currently, synchronized semantic maps, domain ontologies, geographic maps, tag clouds and real-time information retrieval agents suggesting similar topics and nearby locations, provide users with important context information, facilitate the access to complex datasets, and help users navigate large collections of Web documents.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Stern, Hermann; Weichselbraun, Albert (2008): A Geospatial Web Application for Communicating Climate Change. 11th International Conference on Geographic Information Science (AGILE 2008). AGILE Council. GeoVisualization of Dynamics, Movement and Change Workshop. Girona, 2008

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Dickinger, Astrid; Weichselbraun, Albert (2008): Analyzing news media coverage to acquire and structure tourism knowledge. In: Information Technology & Tourism 10, S. 3-17

     

    Abstract: Destination image significantly influences a tourist’s decision-making process. The impact of news media coverage on destination image has attracted research attention and became particularly evident after catastrophic events such as the 2004 Indian Ocean earthquake that triggered a series of lethal tsunamis. Building upon previous research, this article analyzes the prevalence of tourism destinations among 162 international media sites. Term frequency captures the attention a destination receives—from a general and, after contextual filtering, from a tourism perspective. Calculating sentiment estimates positive and negative media influences on destination image at a given point in time. Identifying semantic associations with the names of countries and major cities, the results of co-occurrence analysis reveal the public profiles of destinations, and the impact of current events on media coverage. These results allow national tourism organizations to assess how their destination is covered by news media in general, and in a specific tourism context. To guide analysts and marketers in this assessment, an iterative analysis of semantic associations extracts tourism knowledge automatically, and represents this knowledge as ontological structures.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Weichselbraun, Albert (2008): An Automated Approach to Investigating the Online Media Coverage of U.S. Presidential Elections. In: Journal of Information Technology & Politics 5, S. 121-132. Online verfügbar unter doi.org/10.1080/19331680802149582, zuletzt geprüft am 15.01.2021

     

    Abstract: This paper presents the U.S. Election 2004 Web Monitor, a public Web portal that captured trends in political media coverage before and after the 2004 U.S. presidential election. Developed by the authors of this article, the webLyzard suite of Web mining tools provided the required functionality to aggregate and analyze about a half-million documents in weekly intervals. The study paid particular attention to the editorial slant, which is defined as the quantity and tone of a Web site's coverage as influenced by its editorial position. The observable attention and attitude toward the candidates served as proxies of editorial slant. The system identified attention by determining the frequency of candidate references and measured attitude towards the candidate by looking for positive and negative expressions that co-occur with these references. Keywords and perceptual maps summarized the most important topics associated with the candidates, placing special emphasis on environmental issues.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML

  • Scharl, Arno; Weichselbraun, Albert; Gindl, Stefan (2008) : Building Tagged Linguistic Unit Databases for Sentiment Detection In: Tochtermann, Klaus; Maurer, Hermann (Hg.): Proceedings of the 8th International Conference on Knowledge Management: I-Know '08: Graz, 3. - 5. September

    Abstract: Despite the obvious business value of visualizing similarities between elements of evolving information spaces and mapping these similarities e.g. onto geospatial reference systems, analysts are often more interested in how the semantic orientation (sentiment) towards an organization, a product or a particular technology is changing over time. Unfortunately, popular methods that process unstructured textual material to detect semantic orientation automatically based on tagged dictionaries are not capable of fulfilling this task, even when coupled with part-of-speech tagging, a standard component of most text processing toolkits that distinguishes grammatical categories such as article (AT), noun (NN), verb (VB), and adverb (RB). Small corpus size, ambiguity and subtle incremental change of tonal expressions between different versions of a document complicate the detection of semantic orientation and often prevent promising algorithms from being incorporated into commercial applications. Parsing grammatical structures, by contrast, outperforms dictionary-based approaches in terms of reliability, but usually suffers from poor scalability due to their computational complexity. This paper addresses this predicament by presenting an alternative approach based on automatically building Tagged Linguistic Unit (TLU) databases to overcome the restrictions of dictionaries with a limited set of tagged tokens.

    Export-Dateien: Citavi Endnote RIS ISI BibTeX WordXML