Biblio
Companies analyse large amounts of data on clusters of machines, using big data analytic tools such as Apache Spark and Apache Flink to analyse the data. Big data analytic tools are mainly tested regarding speed and reliability. Efforts about Security and thus authentication are spent only at second glance. In such big data analytic tools, authentication is achieved with the help of the Kerberos protocol that is basically built as authentication on top of big data analytic tools. However, Kerberos is vulnerable to attacks, and it lacks providing high availability when users are all over the world. To improve the authentication, this work presents first an analysis of the authentication in Hadoop and the data analytic tools. Second, we propose a concept to deploy Transport Layer Security (TLS) not only for the security of data transportation but as well for authentication within the big data tools. This is done by establishing the connections using certificates with a short lifetime. The proof of concept is realized in Apache Spark, where Kerberos is replaced by the method proposed. We deploy new short living certificates for authentication that are less vulnerable to abuse. With our approach the requirements of the industry regarding multi-factor authentication and scalability are met.