Abstract | Extracting knowledge from unstructured data silos, a legacy of old applications, is mandatory for improving the governance of today's cities and fostering the creation of smart cities. Texts in natural language often compose such data. Nevertheless, the inference of useful information from a linguistic-computational analysis of natural language data is an open challenge. In this paper, we propose a clustering method to analyze textual data employing the unsupervised machine learning algorithms k-means and hierarchical clustering. We assess different vector representation methods for text, similarity metrics, and the number of clusters that best matches the data. We evaluate the methods using a real database of a public record service of security occurrences. The results show that the k-means algorithm using Euclidean distance extracts non-trivial knowledge, reaching up to 93% accuracy in a set of test samples while identifying the 12 most prevalent occurrence patterns. |