Biblio
Today, cloud vendors host third party black-box services, whose developers usually provide only textual descriptions or purely syntactical interface specifications. Cloud vendors that give substantial support to other third party developers to integrate hosted services into new software solutions would have a unique selling feature over their competitors. However, to reliably determine if a service is reusable, comprehensive service specifications are needed. Characteristic for comprehensive in contrast to syntactical specifications are the formalization of ontological and behavioral semantics, homogeneity according to a global ontology, and a service grounding that links the abstract service description and its technical realization. Homogeneous, semantical specifications enable to reliably identify reusable services, whereas the service grounding is needed for the technical service integration. In general, comprehensive specifications are not available and have to be derived. Existing automatized approaches are restricted to certain characteristics of comprehensiveness. In my PhD, I consider an automatized approach to derive fully-fledged comprehensive specifications for black-box services. Ontological semantics are derived from syntactical interface specifications. Behavioral semantics are mined from call logs that cloud vendors create to monitor the hosted services. The specifications are harmonized over a global ontology. The service grounding is established using traceability information. The approach enables third party developers to compose services into complex systems and creates new sales channels for cloud and service providers.
Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity. Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information. In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata. We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and it achieves an area under curve value of the receiver operating characteristic, ROC-AUC, of 0.991. It significantly outperforms the state of the art represented by the rule-based Wikidata Abuse Filter (0.865 ROC-AUC) and a prototypical vandalism detector recently introduced by Wikimedia within the Objective Revision Evaluation Service (0.859 ROC-AUC).