A verification benchmark is a model together with a list of properties to be checked on the model.
forum
Submitted by ttj on Wed, 03/08/2023 - 4:35pm
forum
Submitted by ttj on Thu, 05/19/2022 - 10:56am
forum
Submitted by ttj on Wed, 05/05/2021 - 8:58am
We will discuss the 2021 ARCH-COMP AINNCS benchmarks here. As we are a bit late, we plan to mostly reuse the 2020 benchmarks.
forum
Submitted by althoff on Tue, 06/09/2020 - 3:27am
In this froup we discuss how to organize the automatic evaluation of benchmarks. This allows us to run all benchmarks on the same machine so that we can perform the repeatability evaluation and the collection of computation times in one go.
forum
Submitted by ttj on Thu, 06/04/2020 - 3:36pm
Hi everyone,
We need to finalize the benchmarks in the next few days. There have been some, but not a ton of comments. My group has gone through the benchmarks, and based on our discussion/review of them, I suggest using the following with some comments/details needed, please let us know of any concerns.
1) As discussed earlier, include the benchmarks considered last year:
https://github.com/verivital/ARCH-COMP19-AINNCS
2) Include the following proposed benchmarks:
forum
Submitted by ttj on Fri, 05/22/2020 - 11:20am
forum
Submitted by ttj on Sat, 05/16/2020 - 3:45pm
forum
Submitted by ttj on Sat, 05/16/2020 - 3:41pm
The property is that the network should be monotonically negative for insulin inputs. (eg hold all insulin inputs constant with input range 0-0.1, hold a glucose trace constant (given the below criteria)), and as you increase insulin in one location, the predicted output should decrease. Glucose values are allowed to vary [40,400], with a max difference of 25 between two consecutive inputs eg: G(t) vs G(t-5). These networks all take $7$ glucose inputs G(t), G(t-5), ... , G(t-30) and $7$ insulin inputs u(t), .., u(t-30).
forum
Submitted by ttj on Fri, 05/15/2020 - 4:36pm
In this benchmark, we consider a 1/10-scale autonomous racing car navigating a hallway environment. The car has a neural network controller that takes a lidar scan of the environment as input and produces steering for the car (assuming constant throttle). The verification problem is to verify that, starting from a set of initial positions, the car will not crash for some amount of time (e.g., 7s). In the simplest version of the problem, there are 21 lidar rays as input to the controller. The car has a standard continuous-time bicycle model.
forum
Submitted by ttj on Thu, 05/14/2020 - 6:37pm
We test our tool ReachNN on the benchmarks that have the same dynamics as the benchmarks proposed in Sherlock, but with different settings of neural-network (NN) controllers. The main difference lies on the NN controller, where the NN controller we use may have different activation functions simultaneously, e.g. we have ReLU+sigmoid NN controller for Ex. \#1. The input dimension of the NN controller, which is the dimension of the system state, ranges from 2 to 4.