Biblio
This paper introduces typy, a statically typed programming language embedded by reflection into Python. typy features a fragmentary semantics, i.e. it delegates semantic control over each term, drawn from Python's fixed concrete and abstract syntax, to some contextually relevant user-defined semantic fragment. The delegated fragment programmatically 1) typechecks the term (following a bidirectional protocol); and 2) assigns dynamic meaning to the term by computing a translation to Python. We argue that this design is expressive with examples of fragments that express the static and dynamic semantics of 1) functional records; 2) labeled sums (with nested pattern matching a la ML); 3) a variation on JavaScript's prototypal object system; and 4) typed foreign interfaces to Python and OpenCL. These semantic structures are, or would need to be, defined primitively in conventionally structured languages. We further argue that this design is compositionally well-behaved. It avoids the expression problem and the problems of grammar composition because the syntax is fixed. Moreover, programs are semantically stable under fragment composition (i.e. defining a new fragment will not change the meaning of existing program components.)
Syntax extension mechanisms are powerful, but reasoning about syntax extensions can be difficult. Recent work on type-specific languages (TSLs) addressed reasoning about composition, hygiene and typing for extensions introducing new literal forms. We supplement TSLs with typed syntax macros (TSMs), which, unlike TSLs, are explicitly invoked to give meaning to delimited segments of arbitrary syntax. To maintain a typing discipline, we describe two avors of term-level TSMs: synthetic TSMs specify the type of term that they generate, while analytic TSMs can generate terms of arbitrary type, but can only be used in positions where the type is otherwise known. At the level of types, we describe a third avor of TSM that generates a type of a specified kind along with its TSL and show interesting use cases where the two mechanisms operate in concert.
Programming languages often include specialized syntax for common
datatypes (e.g. lists) and some also build in support for specific specialized
datatypes (e.g. regular expressions), but user-defined types must use generalpurpose
syntax. Frustration with this causes developers to use strings, rather than
structured data, with alarming frequency, leading to correctness, performance,
security, and usability issues. Allowing library providers to modularly extend a
language with new syntax could help address these issues. Unfortunately, prior
mechanisms either limit expressiveness or are not safely composable: individually
unambiguous extensions can still cause ambiguities when used together.
We introduce type-specific languages (TSLs): logic associated with a type that
determines how the bodies of generic literals, able to contain arbitrary syntax,
are parsed and elaborated, hygienically. The TSL for a type is invoked only
when a literal appears where a term of that type is expected, guaranteeing noninterference.
We give evidence supporting the applicability of this approach and
formally specify it with a bidirectionally typed elaboration semantics for the
Wyvern programming language.
Web applications must ultimately command systems like web browsers and database engines using strings. Strings derived from improperly sanitized user input can thus be a vector for command injection attacks. In this paper, we introduce regular string types, which classify strings known statically to be in a specified regular language. These types come equipped with common operations like concatenation, substitution and coercion, so they can be used to implement, in a conventional manner, the portions of a web application or application framework that must directly construct com- mand strings. Simple type annotations at key interfaces can be used to statically verify that sanitization has been per- formed correctly without introducing redundant run-time checks. We specify this type system in a minimal typed lambda calculus, λRS.
To be practical, adopting a specialized type system like this should not require the adoption of a new programming language. Instead, we advocate for extensible type systems: new type system fragments like this should be implemented as libraries atop a mechanism that guarantees that they can be safely composed. We support this with two contribu- tions. First, we specify a translation from λRS to a language fragment containing only standard strings and regular ex- pressions. Second, taking Python as a language with these constructs, we implement the type system together with the translation as a library using atlang, an extensible static type system for Python being developed by the authors.
Injection vulnerabilities have topped rankings of the most critical web application vulnerabilities for several years [1, 2]. They can occur anywhere where user input may be erroneously executed as code. The injected input is typically aimed at gaining unauthorized access to the system or to private information within it, corrupting the system's data, or disturbing system availability. Injection vulnerabilities are tedious and difficult to prevent.