Benchmarks in a Standard Form

7 replies [Last post]

Fri, 12/16/2016 - 12:10am

xinchen

Offline

Joined: Dec 7 2016

Dear Organizers,

I would be very grateful if you could provide all benchmarks in a standard form based on which hybrid automata could be obtained easily, since it could not only save a lot of time for the tool authors but also ensure that we are using the same benchmarks.

I have just checked all of the benchmark papers in ARCH and found that they are provided in the files of various types. I am not sure whether all of them could be equivalently modeled by hybrid automata. If not, is it possible to slightly adapt those models to make them available to most of the tools?

Thanks for the help.

Xin

Mon, 12/19/2016 - 11:45am

stanleybak

Offline

Joined: Apr 29 2015

format

I completely agree. In the statuses document that Goran had sent out it said:

Where possible, the problem instance should originate in one of the benchmarks in the
ARCH repository (http://cps-vo.org/group/ARCH/benchmarks) and be specified in the SpaceEx file format, consisting of an XML file for the system and a CFG file for the specification (forbidden states). ( http://spaceex.imag.fr/documentation/user-documentation/spaceex-modeling-language-33 ).

Although nonlinear systems can't be run in SpaceEx currently, the model editor and format can still be used to create the models in a, hopefully, unambiguous way. Hyst can also be used to translate the models from this format. A printer for Flow*, for example, already exists. I can also help in the process of creating a Hyst printer for a specific tool.

In terms of model adaption, I think it's certainly allowed. The hope is that the benchmarks would originate from the ARCH set of benchmarks, but small deviations are probably not an issue. Just point out what's being changed in your benchmark proposal topic.

What do you think Xin?

Tue, 12/20/2016 - 1:11pm

xinchen

Offline

Joined: Dec 7 2016

benchmarks

Thanks Stan. I actually do not think modeling a nonlinear system in the SpaceEx format a problem, since there is Hyst and even manual translation is not hard. But I am wondering whether the organizers could provide some standard benchmarks for us or not. We should also consider the benchmarks of various categories, linear or nonlinear constraints, with or without disturbances, whether to check a fixed point. Also, it is necessary to provide several safety checking tasks of different difficulties for a system.

I will do so in my benchmark proposal.

Tue, 12/20/2016 - 4:08pm

stanleybak

Offline

Joined: Apr 29 2015

Benchmarks Response

I believe the intention/hope was (Goran or Matthias can correct me if I'm wrong) that each participant would provide a benchmark in the SpaceEX format based on one of the ARCH benchmarks. The category of the benchmark, whether it's nonliner or linear, or bounded depth, fixpoints or jump/time bound, depends on the category that their tool is participating in. Then, they would provide two safety specifications for the model ("problem instances"), one which is safe, and one which is unsafe.

Wed, 12/21/2016 - 11:09am

frehse

Offline

Joined: Nov 6 2013

benchmark selection

@Stan, thank you for your comments, you're absolutely right.

For the competition it would be nice if every tool could run on at least one problem instance, and if every problem instance would have at least one tool running on it. Both criteria are satisfied if every tool-owner proposes a problem instance that he/she likes and that run on his/her tool.

Wed, 03/29/2017 - 5:47am

sschupp

Offline

Joined: Dec 8 2016

Config file transformation

I am mostly using Hyst to transform the provided Benchmarks to Flow* syntax, which my tool can parse. Regarding the configuration/settings, I think there are some problems concerning the translation:

I was wondering, how we should translate the number of iterations correctly. From what I know, the number of iterations in SpaceEx denotes the number of discrete post computations, i.e. the number of jumps taken. Other tools seem to specify a limit on the jump depth or a limit on the computed flowpipes. In any case, is it okay to adjust the jump depth to a value, such that the number of computed discrete post successors roughly is the same as the specified iteration bound?

Fri, 04/06/2018 - 8:41am

stanleybak

Offline

Joined: Apr 29 2015

Manual Settings are Okay

In my opinion, "number of iterations" is a setting made up by verification tool authors and is not typically part of the verification problem being solved. For this reason, it's okay to manually adjust this setting based on what the particular tool is using.

Fri, 04/06/2018 - 10:02am

frehse

Offline

Joined: Nov 6 2013

It should be in the spec -- in a meaningful way

Stan is absolutely right that the "number of iterations" setting in SpaceEx is made up and does not easily translate to search depth. Personally, I don't like "search depth" as a criterion since it depends on how things are modeled. Adding a self-loop may completely change the results. So the spec should be something that has meaning on the model level, like the number of transitions with a given label. In SpaceEx, this can then be verified by including a monitor automaton that counts the number of transitions.

Benchmarks in a Standard Form

Comment viewing options