Benchmark | CPS-VO

forum

2023 AINNCS Benchmarks

Submitted by ttj on Wed, 03/08/2023 - 5:35pm

We will discuss the 2023 AINNCS benchmarks here. By default, we will plan to reuse the benchmarks from 2022. Details on those are as below.

ARCH-COMP22 Category Report: Artificial Intelligence and Neural Network Control Systems (AINNCS) for Continuous and Hybrid Systems Plants
https://easychair.org/publications/paper/C1J8

2022 Model files:
https://github.com/verivital/ARCH-COMP2022

forum

2022 AINNCS Benchmarks

Submitted by ttj on Thu, 05/19/2022 - 11:56am

We will discuss the 2022 ARCH-COMP AINNCS benchmarks here. We plan to mostly reuse the 2021 benchmarks, which were also mostly used in the 2020 edition.

2021 benchmarks:

https://github.com/verivital/ARCH-COMP2021

2021 report "ARCH-COMP21 Category Report: Artificial Intelligence and Neural Network Control Systems (AINNCS) for Continuous and Hybrid Systems Plants":

https://easychair.org/publications/paper/Jq4h

forum

2021 AINNCS Benchmarks

Submitted by ttj on Wed, 05/05/2021 - 9:58am

We will discuss the 2021 ARCH-COMP AINNCS benchmarks here. As we are a bit late, we plan to mostly reuse the 2020 benchmarks.

forum

Automatic evaluation of benchmarks

Submitted by althoff on Tue, 06/09/2020 - 4:27am

In this froup we discuss how to organize the automatic evaluation of benchmarks. This allows us to run all benchmarks on the same machine so that we can perform the repeatability evaluation and the collection of computation times in one go.

forum

2020 Benchmark Selections

Submitted by ttj on Thu, 06/04/2020 - 4:36pm

Hi everyone,

We need to finalize the benchmarks in the next few days. There have been some, but not a ton of comments. My group has gone through the benchmarks, and based on our discussion/review of them, I suggest using the following with some comments/details needed, please let us know of any concerns.

1) As discussed earlier, include the benchmarks considered last year:

https://github.com/verivital/ARCH-COMP19-AINNCS

2) Include the following proposed benchmarks:

forum

2020 Benchmark Proposal: Linear Systems by Haitham Khedr

Submitted by ttj on Fri, 05/22/2020 - 12:20pm

We'd like to propose a benchmark with linear dynamics. We consider a robot in a structured environment of polytypic obstacles, and we'd like to identify its safe regions of operation within the environment, given a trained NN.

Paper: https://dl.acm.org/doi/pdf/10.1145/3302504.3311802

Code: https://github.com/rcpsl/ReluVerify

forum

2020 Benchmark Proposal: Pendulum and Airplane by Chelsea Sidrane and Amir Maleki

Submitted by ttj on Sat, 05/16/2020 - 4:45pm

Details here with writeup:

https://github.com/amaleki2/benchmark_closedloop_verification

forum

2020 Benchmark Proposal: Artificial Pancreas by Souradeep Dutta

Submitted by ttj on Sat, 05/16/2020 - 4:41pm

The property is that the network should be monotonically negative for insulin inputs. (eg hold all insulin inputs constant with input range 0-0.1, hold a glucose trace constant (given the below criteria)), and as you increase insulin in one location, the predicted output should decrease. Glucose values are allowed to vary [40,400], with a max difference of 25 between two consecutive inputs eg: G(t) vs G(t-5). These networks all take $7$ glucose inputs G(t), G(t-5), ... , G(t-30) and $7$ insulin inputs u(t), .., u(t-30).

forum

2020 Benchmark Proposal: F1/10 Car with LIDAR by Radoslav Ivanov

Submitted by ttj on Fri, 05/15/2020 - 5:36pm

In this benchmark, we consider a 1/10-scale autonomous racing car navigating a hallway environment. The car has a neural network controller that takes a lidar scan of the environment as input and produces steering for the car (assuming constant throttle). The verification problem is to verify that, starting from a set of initial positions, the car will not crash for some amount of time (e.g., 7s). In the simplest version of the problem, there are 21 lidar rays as input to the controller. The car has a standard continuous-time bicycle model.

forum

2020 Benchmark Proposal: Heterogeneous Networks (multiple activation types) by Chao Huang

Submitted by ttj on Thu, 05/14/2020 - 7:37pm

We test our tool ReachNN on the benchmarks that have the same dynamics as the benchmarks proposed in Sherlock, but with different settings of neural-network (NN) controllers. The main difference lies on the NN controller, where the NN controller we use may have different activation functions simultaneously, e.g. we have ReLU+sigmoid NN controller for Ex. \#1. The input dimension of the NN controller, which is the dimension of the system state, ranges from 2 to 4.