Benchmark

A verification benchmark is a model together with a list of properties to be checked on the model.
forum

Visible to the public 2023 AINNCS Benchmarks

We will discuss the 2023 AINNCS benchmarks here. By default, we will plan to reuse the benchmarks from 2022. Details on those are as below.

ARCH-COMP22 Category Report: Artificial Intelligence and Neural Network Control Systems (AINNCS) for Continuous and Hybrid Systems Plants
https://easychair.org/publications/paper/C1J8

2022 Model files:
https://github.com/verivital/ARCH-COMP2022

forum

Visible to the public 2022 AINNCS Benchmarks

We will discuss the 2022 ARCH-COMP AINNCS benchmarks here. We plan to mostly reuse the 2021 benchmarks, which were also mostly used in the 2020 edition.

2021 benchmarks:

https://github.com/verivital/ARCH-COMP2021

2021 report "ARCH-COMP21 Category Report: Artificial Intelligence and Neural Network Control Systems (AINNCS) for Continuous and Hybrid Systems Plants":

https://easychair.org/publications/paper/Jq4h

forum

Visible to the public 2021 AINNCS Benchmarks

We will discuss the 2021 ARCH-COMP AINNCS benchmarks here. As we are a bit late, we plan to mostly reuse the 2020 benchmarks.

forum

Visible to the public Automatic evaluation of benchmarks

In this froup we discuss how to organize the automatic evaluation of benchmarks. This allows us to run all benchmarks on the same machine so that we can perform the repeatability evaluation and the collection of computation times in one go.

forum

Visible to the public 2020 Benchmark Selections

Hi everyone,

We need to finalize the benchmarks in the next few days. There have been some, but not a ton of comments. My group has gone through the benchmarks, and based on our discussion/review of them, I suggest using the following with some comments/details needed, please let us know of any concerns.

1) As discussed earlier, include the benchmarks considered last year:

https://github.com/verivital/ARCH-COMP19-AINNCS

2) Include the following proposed benchmarks:

forum

Visible to the public 2020 Benchmark Proposal: Linear Systems by Haitham Khedr

We'd like to propose a benchmark with linear dynamics. We consider a robot in a structured environment of polytypic obstacles, and we'd like to identify its safe regions of operation within the environment, given a trained NN.

Paper: https://dl.acm.org/doi/pdf/10.1145/3302504.3311802

Code: https://github.com/rcpsl/ReluVerify

forum

Visible to the public 2020 Benchmark Proposal: Artificial Pancreas by Souradeep Dutta

The property is that the network should be monotonically negative for insulin inputs. (eg hold all insulin inputs constant with input range 0-0.1, hold a glucose trace constant (given the below criteria)), and as you increase insulin in one location, the predicted output should decrease. Glucose values are allowed to vary [40,400], with a max difference of 25 between two consecutive inputs eg: G(t) vs G(t-5). These networks all take $7$ glucose inputs G(t), G(t-5), ... , G(t-30) and $7$ insulin inputs u(t), .., u(t-30).

forum

Visible to the public 2020 Benchmark Proposal: F1/10 Car with LIDAR by Radoslav Ivanov

In this benchmark, we consider a 1/10-scale autonomous racing car navigating a hallway environment. The car has a neural network controller that takes a lidar scan of the environment as input and produces steering for the car (assuming constant throttle). The verification problem is to verify that, starting from a set of initial positions, the car will not crash for some amount of time (e.g., 7s). In the simplest version of the problem, there are 21 lidar rays as input to the controller. The car has a standard continuous-time bicycle model.

forum

Visible to the public 2020 Benchmark Proposal: Heterogeneous Networks (multiple activation types) by Chao Huang

We test our tool ReachNN on the benchmarks that have the same dynamics as the benchmarks proposed in Sherlock, but with different settings of neural-network (NN) controllers. The main difference lies on the NN controller, where the NN controller we use may have different activation functions simultaneously, e.g. we have ReLU+sigmoid NN controller for Ex. \#1. The input dimension of the NN controller, which is the dimension of the system state, ranges from 2 to 4.