forked from bikol/DPRI_doc_20-21
MealyCompiler (Solomonoff) imporved documentation
This commit is contained in:
parent
7a90f34370
commit
5070567b40
@ -1,14 +1,14 @@
|
|||||||
# Dokument wizji projektu
|
# Project Vision Document
|
||||||
|
|
||||||
#### Nazwa projektu: Solomonoff
|
#### Project name: MealyCompiler (Solomonoff)
|
||||||
|
|
||||||
#### Autorzy: Aleksander Mendoza, Bogdan Bondar, Marcin Jabłoński
|
#### Authors: Aleksander Mendoza, Bogdan Bondar, Marcin Jabłoński
|
||||||
|
|
||||||
#### Data: 13.12.2020
|
#### Date: 13.12.2020
|
||||||
|
|
||||||
### 1\. Executive summary
|
### 1\. Executive summary
|
||||||
|
|
||||||
This project focuses on research in the field of automata theory and inductive inference. While many existing libraries already provide support for general purpose automata and implement various related algorithms, this project takes a slightly different approach. The primary tool for working with the library, is through doman specific language of regular expressions. Most of the things can be done without writing even a single line of Java code.
|
This project focuses on research in the field of automata theory and inductive inference. While many existing libraries already provide support for general purpose automata and implement various related algorithms, this project takes a slightly different approach. The primary tool for working with the library, is through domain specific language of regular expressions. Most of the things can be done without writing even a single line of Java code. We provide commandline as well as web interface and integrated build system.
|
||||||
|
|
||||||
The main applications concern formal methods, natural language processing, state-based system modeling, pattern recognition, inductive inference and machine learning. It can be of great help for researchers as well as can be used on industrial scale.
|
The main applications concern formal methods, natural language processing, state-based system modeling, pattern recognition, inductive inference and machine learning. It can be of great help for researchers as well as can be used on industrial scale.
|
||||||
|
|
||||||
@ -16,51 +16,42 @@ The greatest competitor for Solomonoff is Google's OpenFST project. Solomonoff i
|
|||||||
|
|
||||||
### 2\. Goal and target audience
|
### 2\. Goal and target audience
|
||||||
|
|
||||||
The goal is provide an better alternative for OpenFST. Solomonoff strives to provide improvement in the following domains:
|
The goal is to provide a better alternative for OpenFST.
|
||||||
|
|
||||||
- OpenFst has Matcher that was meant to compactify ranges. In Solomonoff all transitions are ranged and follow the theory of symbolic automata. They are well integrated with regular expressions and Glushkov's construction. They allow for more efficient squaring and subset construction. Instead of being an ad-hoc feature, they are well integrated everywhere.
|
OpenFst was a niche project with sparse documentation and neglected user interface. Everything is done primarily in C++ with templates, while commandline interface is basic, regular expression language has many fundamental flaws, their build system consists of tool for generating Makefiles that only call the compiler. Solomonoff strives to bring improvement with the following features:
|
||||||
|
|
||||||
- OpenFst has no built-in support for regular expression and it was added only later in form of Thrax grammars, that aren't much more than another API for calling library functions. In Solomonoff the regular expressions are the library. Instead of having separate procedures for union, concatenation and Kleene closure, there is only one procedure that takes arbitrary regular expression and compiles it in batch. This way everything works much faster, doesn't lead to introduction of any ε-transitions (in Solomonoff, ε-transitions aren't even implemented, because they were never needed thanks to Glushkov's construction). This leads to significant differences in performance. You can see benchmarks below.
|
- We have online REPL with examples and guides where everything works out of the box. Everything can be done with our rich and carefully designed regular expression language, so there is no need to write code.
|
||||||
|
|
||||||
- Many operations are implemented in "better" way. For example
|
- We provide extensive documentation in form of 3 scientific papers, detailed technical documentation of compiler implementation, extensive GitHub page and interactive REPL tutorial.
|
||||||
|
|
||||||
- Solomonoff has no need for ArcSort because all arcs are always sorted
|
- We ship with integrated build system that supports parallelism and interacts with compiler directly for optimum performance. It can be configured in TOML build files.
|
||||||
|
|
||||||
- Solomonoff has no Optimise because compiler decides much better when to optimise things
|
- Everything can be done in regular expressions and there is no need to interact with compiler's API (althought we do provide as an addition).
|
||||||
|
|
||||||
- Solomonoff has no RmEpsilon, because it has no epsilons in the first place
|
- There are plentiful of technical improvements, performance optimisations and innovative algorithms. All of them are described in detail in our documentation. We also provide performance benchamrks on our GitHub
|
||||||
|
|
||||||
- Solomonoff has no CDRewrite because the same effect can be achieved much more efficeintly with lexicographic weights and Kleene closure.
|
- Unlike any other existing automata tool, we ship with out-of-the functions for box inductive inference.
|
||||||
|
|
||||||
- In OpenFST if you perform `("a":"b"):"c"` you get as a result `"b":"c"`. OpenFST treats : as a binary operation. Solomonoff on the other hand treats : as unary operation. `("a":"b"):"c"` results in `"a":"bc"`. In fact `:'b'` is treated as `'':'b'` and `'a':'b'` is merely a syntactic sugar for 'a' '':'b'. You can write for example :'b' 'a'. The strings prefixed with : are the output strings, whereas strings without : are the input strings. You can very easily perform inversion of automaton by turning input strings into output strings and vice-versa. For example in Thrax you have `Inverse["a":"b"]` which results in `"b":"a"`. In Solomonoff the inverse of `'a':'b'` becomes `:'a' 'b'`. Very stright-forward. The semantics of `:` in OpenFST strip the output of the left-hand side. In fact, `X:Y` in OpenFST translates more literally to `stripOutput[X]:Y` in Solomonoff.
|
|
||||||
|
|
||||||
- Thrax supports so called "temporary"/"outside of alphabet" symbols. Any time you write `"[NEW_SYMBOL]"` it will take some large UNICODE codepoint hoping that it's not used anywhere else. In Solomonoff this is not necessary, because using type system is much better suited for this task. You can just write
|
|
||||||
|
|
||||||
|
|
||||||
x = 'abcd' // some regex
|
The project found approval among members of Samsung's R&D team for Bixby developement and inductive inference researchers from Dortmund Technical University, Germany. Our system will be deployed in Samsung, our website can aid linguists with easily learning Solomonoff and our build system should make commandline usage more accessible.
|
||||||
x <: [a-z]* //ensure that it uses specific alphabet
|
|
||||||
z = x <404> x // use codepoint 404 as some "external symbol"
|
|
||||||
z <: ([a-z]|<404>)*
|
|
||||||
|
|
||||||
Opis celu powstania projektu powinien w szczególności odpowiadać na następujące pytania:
|
The main products are:
|
||||||
|
|
||||||
- Solomonoff is much fater. Results of benchmarks on a large linguistic dictionary are as follows. Thrax compiles in 19 minutes, while Solomonoff in just 4 seconds. OpenFST executes in 250 milliseconds, while Solomonoff in 5 milliseconds. FAR file takes up 27M while Solomonoff's archive only 552K. OpenFST takes up 6336K in RAM, whole Solomonoff only 738K. OpenFST is written in C++, while Solomonoff in Java, so our implementation performs better despite being disadvantaged.
|
- compiler backend - the core of the project
|
||||||
|
- commandline interface with build system - allows more technical users to automate Solomonoff with shell scripts
|
||||||
|
- website and REPl - allows less technical users to experiene Solomonoff without much friction.
|
||||||
|
|
||||||
The project found approval among members of Samsung's R&D team for Bixby developement and inductive inference researchers from Dortmund Technical University, Germany.
|
Our project brings performance, which was measured with benchmarks (available on GitGub). We provide innovation, which can be easily noticed by the sheer amount of features Solomonoff implements, which are not available in any other existing tool (more information in scientific papers). We are user-friendly and we assesed it during usability tests and by collecting feedback from end users.
|
||||||
|
|
||||||
### 3\. Market
|
### 3\. Market
|
||||||
|
|
||||||
Currently there exists only one serious alternative, which is openfst library with their Thrax extension for writing regex-like grammars. Their solution has numerous problems. It's focus on probabilistic approach to modeling nondeterminism, made the library quite slow. It also became a double-edged sword, by making rule-based system difficult to maintain (compiler doesn't warn programmer when nondeterminism causes some rules to overshadow others). Compilation of grammars is lacking in many aspects. The grammar expression language is very basic and obscure. Compiler is not parallized and highly inefficient. On top of that, the probabilistic approach.
|
Currently there exists only one serious alternative, which is `OpenFST` library with their Thrax extension for writing regex-like grammars. Their solution has numerous problems. It's focus on probabilistic approach to modeling nondeterminism, made the library quite slow. It also became a double-edged sword, by making rule-based system difficult to maintain (compiler doesn't warn programmer when nondeterminism causes some rules to overshadow others). Compilation of grammars is lacking in many aspects. The grammar expression language is very basic and obscure. Compiler is not parallized and highly inefficient. On top of that, the probabilistic approach.
|
||||||
|
|
||||||
Our solution completely gets rid of nondeterminism. We will not attempt to model any probabilistic models. We will focus more heavily on making the expression language user-friendly and helpful in detecting potential non-determinism. The compiler should be able to process multiple rules in parallel to make compilation time faster. We could also take advantage of guarantees of determinism. There are many optimisations to be made thanks to this. The end goal is to make the whole system work in linear time and linear space.
|
- Our solution completely gets rid of nondeterminism. We will not attempt to model any probabilistic models. We will focus more heavily on making the expression language user-friendly and helpful in detecting potential non-determinism. The compiler should be able to process multiple rules in parallel to make compilation time faster. We implement Glushkov's construction and take advantage of special properties of functional transducers. For better user experience and easier management of code, we provide integrated build system and online REPL with extensive documentation.
|
||||||
|
|
||||||
|
|
||||||
For better user experience and easier management of code, we add build system and basic package manager (although, we don't provide centralised package repository)
|
|
||||||
|
|
||||||
Our project has been noticed by other researchers. We will work together with Dortmund University and make Solomonoff publicly available.
|
There used to be another similar project, developed by AT&T but it's been long discontinued and replaced by OpenFST. Aside from those two competitors, the market is very niche. There do not exist any other tools (or at least, they are not available publicly) strictly for compiling transducer. The closest other competitors might include general-purpose automata libraries like Google's `RE2` or Anders Møller's `BRICS` library. However, those projects are fundamentally different as they implement classical automata instead of tranasducers. As a result, Solomonoff is strictly more powerful and superior to those solutions. There is no other automata compiler with support for inductive inference. There is no ther symbolic transducer library (although there exist general-purpose libraries for symbolic automata, like `symbolicautomata`, `Microsoft Automata Library`, `Rex`, `Bex`, `Fast`, `Mona`).
|
||||||
|
|
||||||
|
|
||||||
There used to be another similar project, developed by AT&T but it's been long discontinued and replaced by OpenFST. Aside from that, the market is extremely niche. There do not exist any other tools (or at least, they are not available publicly). The closest other competitors might include general-purpose automata libraries like Google's RE2 or Anders Møller's BRICS library. However, those projects are fundamentally different as they implement classical automata instead of tranasducers.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -68,20 +59,12 @@ There used to be another similar project, developed by AT&T but it's been long d
|
|||||||
|
|
||||||
A simple and efficient library written in C will be the main and primary component of our product. On top of that, it will have command-line interface equipped with compiler. For easy and quick access, we should support online repl for all curious people who want to give our library a try. The compiler should support parallelism, warn user about non-determinism and allow for possibly some extent of generic programming (by defining functions working on regular expressions or bulk-generation of rules according to some regularities). We should ,however pay extra attention, to not making this language turing complete/undecidable by accident (otherwise compilation might never end).
|
A simple and efficient library written in C will be the main and primary component of our product. On top of that, it will have command-line interface equipped with compiler. For easy and quick access, we should support online repl for all curious people who want to give our library a try. The compiler should support parallelism, warn user about non-determinism and allow for possibly some extent of generic programming (by defining functions working on regular expressions or bulk-generation of rules according to some regularities). We should ,however pay extra attention, to not making this language turing complete/undecidable by accident (otherwise compilation might never end).
|
||||||
|
|
||||||
- simple and efficient library written in Java
|
- simple and efficient compiler-backend written in Java
|
||||||
- essential functions for operations of
|
- regular expression (concatenation, Kleene closure, union, composition, projection, inverse, composition, difference)
|
||||||
- concatenation
|
- algorithms of inductive inference
|
||||||
- Kleene closure
|
|
||||||
- union
|
|
||||||
- composition
|
|
||||||
- projection
|
|
||||||
- additional less important operations
|
|
||||||
- inverse
|
|
||||||
- composition
|
|
||||||
- algorithms of inference
|
|
||||||
- type system
|
- type system
|
||||||
- integration with LearnLib
|
- integration with LearnLib
|
||||||
- is optimised for functional ranged transducers (so called symbolic automata)
|
- is optimised for functional ranged transducers (symbolic automata)
|
||||||
|
|
||||||
- REPL and build system
|
- REPL and build system
|
||||||
- support for parallelism
|
- support for parallelism
|
||||||
@ -90,24 +73,27 @@ A simple and efficient library written in C will be the main and primary compone
|
|||||||
- dependency resolver
|
- dependency resolver
|
||||||
- supports everything that compiler does
|
- supports everything that compiler does
|
||||||
- additional directives
|
- additional directives
|
||||||
|
- TOML configurations
|
||||||
|
|
||||||
- online repl
|
- online repl and interactive tutorial
|
||||||
- can write regular expressions on-the-fly
|
- can write regular expressions on-the-fly
|
||||||
- can test for determinism without constructing automata (it's more efficient)
|
- has all functions of the compiler
|
||||||
- user can download effects of their work for their local computer
|
- saves work of user (cookies and session)
|
||||||
|
- syntax highlighting
|
||||||
|
- user can download the effects of their work for their local computer
|
||||||
|
- visualizes graphs of automata
|
||||||
|
- provides technical documentation
|
||||||
|
|
||||||
Proposed architecture (one of possible ways we could implement it):
|
Project architecture and key components:
|
||||||
|
|
||||||
- there is all theoretical work and background written in our PDF
|
- there is all theoretical work and background written in our PDF
|
||||||
- theoretical paper serves as basis for formal specification of library functions
|
- theoretical paper serves as basis for formal specification of library functions
|
||||||
- Java compiler - everything can be done from regualr expressions. User does not need to write a single Java line to use the compiler.
|
- Java compiler uses as few libraries as possible: ANTLR for parsing, LearnLib for inductive inference
|
||||||
- REPL is extensions of compiler itself and shares codebase
|
- REPL is developed on top of compiler backend and is not included in the compiler itself (although compiler provides certain facilities necessary for implementing REPL)
|
||||||
- build system is developed and shipped independently from compiler
|
- build system is developed and shipped independently from compiler backend
|
||||||
package manager is intergrated with build system and shares common codebase
|
- online REPL with backend in Spring, uses compiler's Java API.
|
||||||
- online REPL is built by compiling Java compiler with JWebAssembly
|
- the REPL used by build system and web brwoser has overlaping features but due to inherent differences between the two enviroments their implementations differ a little.
|
||||||
- backend in Spring
|
- website contains documentation, examples and tutorials
|
||||||
- website with basic info, documentation, examples and tutorials
|
|
||||||
YouTube channel and blog contain additional resources. They help us to promote the library and attract users.
|
|
||||||
|
|
||||||
Our target audiences include:
|
Our target audiences include:
|
||||||
|
|
||||||
@ -126,8 +112,6 @@ who might use it as one of their tools. (Especially if their employees are any o
|
|||||||
### 5\. Scope and limitations
|
### 5\. Scope and limitations
|
||||||
|
|
||||||
|
|
||||||
Time is our most valuable resource. If we start running out of it, we might drop support for formal specification and/or transducers. By the end of semester we should have working library, although there is no guarantee that it will be optimised. Optimisations should be ready by the end of second semester, though. There should also be a working basic version of expression language and compiler for it by the end of first semester. We should also have more-or-less working prototype of online repl, although the extent of what "working" means depends heavily on state of library.
|
|
||||||
|
|
||||||
Work schedule:
|
Work schedule:
|
||||||
|
|
||||||
- first semester
|
- first semester
|
||||||
@ -141,17 +125,29 @@ Work schedule:
|
|||||||
- online repl integrated with compiler
|
- online repl integrated with compiler
|
||||||
- machine learning algorithms
|
- machine learning algorithms
|
||||||
- build system
|
- build system
|
||||||
- packaging system
|
|
||||||
|
|
||||||
Team:
|
Team:
|
||||||
|
|
||||||
- Aleksander Mendoza
|
- Aleksander Mendoza
|
||||||
- formal specification and theoretical foundations
|
- formal specification and theoretical foundations
|
||||||
- compiler implementation (Java)
|
- compiler implementation (Java)
|
||||||
|
- unit tests (JUnit)
|
||||||
- Bogdan Bondar
|
- Bogdan Bondar
|
||||||
- web design (Bootstrap)
|
- web design (Bootstrap)
|
||||||
- backend development (Spring)
|
- backend development (Spring)
|
||||||
|
- integration tests (Selenium)
|
||||||
- Marcin Jabłoński
|
- Marcin Jabłoński
|
||||||
- build system (Java)
|
- build system (Java)
|
||||||
- REPL (ANTLR)
|
- REPL (ANTLR)
|
||||||
- assisting with compiler implementation
|
- assisting with compiler implementation (initial but discontinued C version)
|
||||||
|
|
||||||
|
Limitations:
|
||||||
|
|
||||||
|
- no official support for mobile devices. Website is responsive but no mobile-optimised version was added. End users should not code on their smartphones, which is not a common practice either way.
|
||||||
|
- no/limited compiler backend for embedded devices. While JavaME does exist, the compiler was not meant to be deployed in embedded systems.
|
||||||
|
- no (centralised) package manager for build-system. Developing package manager infrastructure is a prohibitively expensive operation.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -1,31 +1,44 @@
|
|||||||
# Project Vision Document
|
# Project Requirements Document
|
||||||
|
|
||||||
### Project name: Solomonoff
|
### Project name: MealyCompiler (Solomonoff)
|
||||||
|
|
||||||
### Authors: Aleksander Mendoza, Bogdan Bondar, Marcin Jabłoński
|
### Authors: Aleksander Mendoza, Bogdan Bondar, Marcin Jabłoński
|
||||||
|
|
||||||
### Date: 13.12.2020
|
### Date: 8.01.2021
|
||||||
|
|
||||||
#### 0\. Document version
|
#### 0\. Document version
|
||||||
|
|
||||||
- 13.12.2020 - initial version
|
- 13.12.2020 - initial version
|
||||||
|
- 8.01.2021 - minor improvements and final touch
|
||||||
|
|
||||||
#### 1\. Project's components (project's products)
|
#### 1\. Project's components (project's products)
|
||||||
|
|
||||||
- simple and efficient library written in Java
|
|
||||||
- essential functions for operations of
|
Done in first semester:
|
||||||
- concatenation
|
|
||||||
- Kleene closure
|
- C compiler backend (discontinued, because requirements shifted more towards Java)
|
||||||
- union
|
- regular expression (concatenation, Kleene closure, union, output)
|
||||||
- composition
|
|
||||||
- projection
|
- Java prototype
|
||||||
- additional less important operations
|
- regular expression (concatenation, Kleene closure, union
|
||||||
- inverse
|
- type system
|
||||||
- composition
|
|
||||||
- algorithms of inference
|
- Online REPL prototype
|
||||||
|
- runs C backend in WebAssembly
|
||||||
|
- Ace editor with syntax highlighting
|
||||||
|
|
||||||
|
Second semester:
|
||||||
|
|
||||||
|
|
||||||
|
- simple and efficient compiler-backend written in Java
|
||||||
|
- regular expression (concatenation, Kleene closure, union, output, composition, projection, inverse, composition, difference)
|
||||||
|
- algorithms of inductive inference
|
||||||
- type system
|
- type system
|
||||||
- integration with LearnLib
|
- integration with LearnLib
|
||||||
- is optimised for functional ranged transducers (so called symbolic automata)
|
- is optimised for functional ranged transducers (symbolic automata)
|
||||||
|
- parser in ANTLR
|
||||||
|
|
||||||
|
|
||||||
- REPL and build system
|
- REPL and build system
|
||||||
- support for parallelism
|
- support for parallelism
|
||||||
- non-determinism warnings
|
- non-determinism warnings
|
||||||
@ -33,25 +46,72 @@
|
|||||||
- dependency resolver
|
- dependency resolver
|
||||||
- supports everything that compiler does
|
- supports everything that compiler does
|
||||||
- additional directives
|
- additional directives
|
||||||
- online repl
|
- TOML configurations
|
||||||
- has all features of the compiler itself
|
|
||||||
- user can download effects of their work for their local computer
|
- online repl and interactive tutorial
|
||||||
- syntax highlighting
|
- can write regular expressions on-the-fly and has all functions of the compiler (REST calls to Spring backend, which calls compiler Java API)
|
||||||
- documentation and code samples
|
- saves work of user (cookies and session)
|
||||||
|
- syntax highlighting (Ace editr)
|
||||||
|
- user can download the effects of their work for their local computer
|
||||||
|
- visualizes graphs of automata (uses viz.js)
|
||||||
|
- provides technical documentation (formulas with MathJax)
|
||||||
|
- tests
|
||||||
|
- integration tests in Python with Selenium (Firefox + Chrome)
|
||||||
|
- all invariants, precodnitions, postconditions of specification expressed in form of assertions. Runtime analysis of specification with JUnit.
|
||||||
|
- automatically generated tests for random automata
|
||||||
|
- performance benchmarks
|
||||||
|
- usability tests
|
||||||
- theory and specification
|
- theory and specification
|
||||||
- scientific papers explaining the theory with appropriate mathematical rigour
|
- scientific papers explaining the theory with appropriate mathematical rigour
|
||||||
- papers with proofs of correctness of essential algorithms
|
- papers with proofs of correctness of essential algorithms
|
||||||
- specification of algorithms expressed with assertions, deeply checked with runtime analysis
|
|
||||||
|
|
||||||
#### 2\. Project limitations
|
#### 2\. Project limitations
|
||||||
|
|
||||||
There are not many limitations in our project. It does not require keeping track of userbase, by using Java we do not impose any restrictions on operating systems and most importantly, we support most of the required features. Some users might feel restricted by not being able to define nondeterministic transducers but in reality it's for the better. Solomonoff has certain restrictions, that are there by design and in practice, they should not be a major issue. We do not support probabilistic transducers, but it's because probabilities do not play well with manually crafted regular expressions and they may lead to unstable solutions in the long term. Solomonoff also hides most of its API from user. Only very minimalist Java methods are exposed and transducers should not be manipulated programmatically. This also is by design, because Solomonoff puts heavy emphasis on the language of regular expressions. It embeds regexes in a special vernacular language that more than compensates lack of programmatic API. It also makes the system more user friendly as a whole.
|
- The minimal required Java version 1.8 . Oracle has dropped support for older versions long ago
|
||||||
|
- Website makes minimal use of CSS3, but older browsers should still be able to use the website.
|
||||||
|
- Internet explorer is not supported, because Microsoft stopped developing it.
|
||||||
|
- We did not test website for Safari and Edge, but they should work as well.
|
||||||
|
- build system and commandline interface works on all systems that can run Java. Embedded devices are not supported, as such a use case is unlikely. In the future we might add lightweight runtime that can execute automata on embedded envronments.
|
||||||
|
|
||||||
|
Justifications:
|
||||||
|
|
||||||
|
- initially we started writing compiler in C for best performance. Over the course of development it turned out that performance gains were minimal compared to Java, while the speed writing C code was much slower compared to ease of higher level development in Java. Moreover, the Samsung infrastructure heavily relies on Java and we found out that Java libraries are always preferred over C. Later we also established cooperation with LearnLib from Dortmund university and their entire library is written purely in Java. Hence we decided to switch to Java for better compatibility.
|
||||||
|
- we decided to make a website, because this technology is universaly accessible to everyone. A mobile app would require installation (and touch screen would be uncomfortable for writing regexes), command-line interface is only accessible to advanced users and desktop GUI apps require downloading, installation and setup. An online REPL would make Solomonoff easily accessible to masses.
|
||||||
|
- The build system was implemented in Java for compatibility with compiler backend. It is primarly targeted at more advanced users and large projects. Build system allows for working with multiple files, which extends the compiler backend that is only capable of working with monolithic streams of code.
|
||||||
|
- We considered using user authorization but we decided to keep it simple. Cookies and downloads are out only mean of permanent storage. Hiding our REPL behind "login wall" could potentially turn away some impatient users. There are many demo websites similar to ours that follow similar strategy and don't retain any user data.
|
||||||
|
- There are plenty of compiler features that we purposely did not implement. We do not support probabilistic automata, because their semantics tend to be unpredictable and difficult to control by regexes. We don't allow epsilon transitions and it allows for many optimisations. More such examples and technical details can be found in our documentation.
|
||||||
|
- Build system does not support namespaces. Instead we took approach similar to C, where "modules" are not a first-class language feature and are instead based on naming convention. When it comes to language features we are strong believers that simplicity and follow the mantra of "less means more".
|
||||||
|
|
||||||
|
|
||||||
#### 3\. List of functional requirements
|
#### 3\. List of functional requirements
|
||||||
- basic usage via regular expressions: union, concatenation, kleene closure, composition, inversion
|
|
||||||
- advanced usage via extensibility and native functions: Glushkovs construction supports calls to extenal functions
|
- Java API:
|
||||||
- ease of setup and efficient compilation: easy to use build system
|
- load/save transducer from/to file
|
||||||
- user firendly REPL from web browser without any required setup: compiler integrated in form of a library used by Spring backend
|
- compile reguler expression
|
||||||
|
- run transducer
|
||||||
|
- create multiple independent instances of compiler that can work in parallel
|
||||||
|
- Build system
|
||||||
|
- load one or more files
|
||||||
|
- use transducers defined in other files
|
||||||
|
- compile files in project in parallel
|
||||||
|
- define list of source files in build configuration
|
||||||
|
- store many independent configuration files, even in the same directory
|
||||||
|
- run REPL after building project
|
||||||
|
- Online REPL
|
||||||
|
- open website and follow tutorial (shows additional tips for first time)
|
||||||
|
- compile a larger piece of code and then experiment with it in REPL
|
||||||
|
- compile code line by line in REPL
|
||||||
|
- read the technical documentation
|
||||||
|
- reopen website and continue where you left off (depending on time limit some things might be lost. The server should not store compiler instances indefinitely)
|
||||||
|
- download work progress locally
|
||||||
|
- go to GitHub page/download compiler and build system JAR
|
||||||
|
- Language functionalities:
|
||||||
|
- union, concatenation, kleene closure, output, composition, difference, inversion, identity, clear output
|
||||||
|
- inference: RPNI, RPNI-EDSM, RPNI-MEALY, OSTIA
|
||||||
|
- weights, reflections, functional nondeterminism
|
||||||
|
- ambiguous nondeterminism detection, typechecking
|
||||||
|
- lazy composition, linear programs, hoare-triples
|
||||||
|
- external native functions, optional user extensions
|
||||||
|
|
||||||
|
|
||||||
#### 4\. List of non-functional requirements
|
#### 4\. List of non-functional requirements
|
||||||
@ -61,6 +121,7 @@ There are not many limitations in our project. It does not require keeping track
|
|||||||
- integration in Samsung
|
- integration in Samsung
|
||||||
- integration with LearnLib
|
- integration with LearnLib
|
||||||
- performance benchmarks
|
- performance benchmarks
|
||||||
|
- accessible and easy tutorials even for less technical users like linguists
|
||||||
|
|
||||||
|
|
||||||
#### 5\. Measurable indicators
|
#### 5\. Measurable indicators
|
||||||
@ -70,26 +131,67 @@ There are not many limitations in our project. It does not require keeping track
|
|||||||
- disk usage
|
- disk usage
|
||||||
- execution speed
|
- execution speed
|
||||||
- compilation speed
|
- compilation speed
|
||||||
- parallel compilation
|
- list of features
|
||||||
- prepackaged executable avaiable for download
|
- contributions to LearnLib
|
||||||
|
- deployment on http://solomonoff.projektstudencki.pl/
|
||||||
- unit tests
|
- unit tests
|
||||||
|
- integration tests
|
||||||
- user experience feedback
|
- user experience feedback
|
||||||
|
|
||||||
|
|
||||||
#### 6\. Acceptation criteria for first semester
|
#### 6\. Acceptation criteria for first semester
|
||||||
|
|
||||||
- basic regular expressions: union, concatenation, kleene closure
|
- required:
|
||||||
- basic online playground with syntax highlighter and WebAssembly
|
- C compiler implementation:
|
||||||
- prototype of type system and weighted automata
|
- union,
|
||||||
|
- concatenation,
|
||||||
|
- kleene closure
|
||||||
|
- output
|
||||||
|
- execution
|
||||||
|
- Java prototype, theory and specification
|
||||||
|
- Glushkov's construction with variables
|
||||||
|
- type system
|
||||||
|
- nondeterminism detection
|
||||||
|
- binary search execution
|
||||||
|
- online compiler
|
||||||
|
- WebAssembly bindings
|
||||||
|
- Ace editor and syntax highlighting
|
||||||
|
- website design
|
||||||
|
- expected:
|
||||||
|
- usable precompiled delivery
|
||||||
|
- optimised algorithms
|
||||||
|
- additional operations (composition, inverse, subtraction)
|
||||||
|
- planned:
|
||||||
|
- support for formal verification
|
||||||
|
- inductive inference
|
||||||
|
- optimisations
|
||||||
|
- fully developed compiler
|
||||||
|
- tutorials, examples, how-tos
|
||||||
|
- extensive testing
|
||||||
|
|
||||||
#### 7\. Acceptation criteria for second semester
|
#### 7\. Acceptation criteria for second semester
|
||||||
|
|
||||||
- extended regular expressions
|
- required:
|
||||||
- optimised compiler
|
- fully usable optimised compiler with all additional features
|
||||||
- inductive inference algorithms
|
- working with multiple source files
|
||||||
- build system
|
- inductive inference
|
||||||
- REPL
|
- tutorials, examples how-tos
|
||||||
- online playground with backend, docs, samples and tutorials
|
- compatibility with client's existing Java infrastructure
|
||||||
|
- compatibility with LearnLib
|
||||||
|
- expected:
|
||||||
|
- secondary compiler features (graph visualisation, export/import, external utility functions)
|
||||||
|
- parallel compilation
|
||||||
|
- configurable build system
|
||||||
|
- inductive inference artifacts as build dependencies
|
||||||
|
- scripts for automated integration tests
|
||||||
|
- great performance benchmarks
|
||||||
|
- detailed technical documentation
|
||||||
|
- planned:
|
||||||
|
- partial inductive inference (OSTIA-C) for LearnLib
|
||||||
|
- Thrax-Solomonoff converter for backward-compatibility with legacy systems
|
||||||
|
- Video tutorials
|
||||||
|
- advanced online code editor/full online IDE
|
||||||
|
- extensible build system with plugins and repositories
|
||||||
|
|
||||||
#### 8\. Project work organization
|
#### 8\. Project work organization
|
||||||
|
|
||||||
@ -98,34 +200,48 @@ There are not many limitations in our project. It does not require keeping track
|
|||||||
- weighted transducers
|
- weighted transducers
|
||||||
- inductive inference
|
- inductive inference
|
||||||
- nondeterministic minimization
|
- nondeterministic minimization
|
||||||
|
-
|
||||||
- Bogdan Bondar (implementation)
|
- Bogdan Bondar (implementation)
|
||||||
- backend
|
- Spring backend
|
||||||
- frontend
|
- frontend
|
||||||
- compiler integration
|
- compiler integration
|
||||||
|
- testing (Selenium, unittest, JUnit)
|
||||||
- Marcin Jabłoński (implementation)
|
- Marcin Jabłoński (implementation)
|
||||||
- build system
|
- build system (Java)
|
||||||
- repl
|
- repl
|
||||||
- dependency resolver
|
- dependency resolver
|
||||||
|
- compiler developemnt (assistance and C implementation)
|
||||||
|
- compiler extension for handling multi-file projects
|
||||||
|
|
||||||
Aleksander Mendoza is responsible for finding clients and communicating with them.
|
Aleksander Mendoza is responsible for finding clients and communicating with them.
|
||||||
|
|
||||||
Initially our team attempted to use Scrum, but later we switched to incremental methodology, because workflow relied heavily on specification and long-term planning. Scrum's main advantage lies in its flexibility, which wasn't the key for this project. It also imposed unrealistic and unnatural team dynamics, which only made work more complicated than it had to be. Scrum gives all team memebrs high degree of independence and autonomy. In scrumchat, implementators describe the progress they made. On the other hand, in our project the specification is more rigid and work progresses according to it. Hence, it's always well understood who does what at what moment. The future tasks are generally known ahead of time.
|
Initially our team attempted to use Scrum, but later we switched to incremental methodology, because workflow relied heavily on specification and long-term planning. Scrum's main advantage lies in its flexibility, which wasn't the key for this project. It also imposed unrealistic and unnatural team dynamics, which only made work more complicated than it had to be. Scrum gives all team memebrs high degree of independence and autonomy. In scrumchat, implementators describe the progress they made. On the other hand, in our project the specification is more rigid and work progresses according to it. Hence, it's always well understood who does what at which moment. The future tasks are generally known ahead of time.
|
||||||
|
|
||||||
Tools:
|
Tools:
|
||||||
|
|
||||||
- JIRA
|
- JIRA
|
||||||
- git & GitHub
|
- git & GitHub
|
||||||
|
- CircleCI
|
||||||
|
- Selenium for integration tests
|
||||||
|
- MS Teams for video chats, Messanger for daily quick chat
|
||||||
|
|
||||||
We do not use continous integration, as the project is not meant to be continuously integrated at all. Compiler and library is released in versions.
|
We created a full detailed list of planned tasks at the beginning of semester and tried to follow it, but we also added more unforseen tasks on the rolling basis according to necessity.
|
||||||
REPL and website are meant to follow this versioning as well.
|
Every task corresponded to some palpable feature and its implementation allowed for closing the task.
|
||||||
|
|
||||||
#### 9\. Project risks
|
#### 9\. Project risks
|
||||||
|
|
||||||
The most important risk of our project was its heavy reliance on advanced theoretical concepts. It required plenty of rigour to make sure our foundations are correct and well defined. Should anything in our understanding of automata be wrong, the whole project would at risk of becoming irrelevant.
|
- The most important risk of our project was its heavy reliance on advanced theoretical concepts. It required plenty of rigour to make sure our foundations are correct and well defined. Should anything in our understanding of automata be wrong, the whole project would at risk of becoming irrelevant.
|
||||||
|
|
||||||
The second most critical concern was time. There was plenty to do and very little time. It was haard to estimate how much time any of the tasks would take. While missing initial deadlines due to unforseen complications is typical for software engineering projects, our project was exposed to a such risks at a much larger scale. Should anything be wrong in the formal specification, it could require months of additional research. In the worst case, if there was a mistake, some goals might turn out to be mathematically impossible. For this reason our team had to be rigorous about their promises.
|
- The second most critical concern was time. There was plenty to do and very little time. It was haard to estimate how much any of the tasks would take. While missing initial deadlines due to unforseen complications is typical for software engineering projects, our project was exposed to a such risks at a much larger scale. Should anything be wrong in the formal specification, it could require months of additional research. In the worst case, if there was a mistake, some goals might turn out to be mathematically impossible. For this reason our team had to be rigorous about their promises.
|
||||||
|
|
||||||
There was little risk with respect to technologies, although the most significant one was chosing the appropriate infrastructure for an open-source non-profit project. We can't bear large costs of maintenance, therefore we used open-source and free resources as much as possible. While in the end the project had to switch from WebAssembly to using backend, we hope that in the long-trem we will find hosting sponsored by some research institution. In the worst case, should we not get any sponsorship, the most of the project is resilient and can still remain relevant even without backend. The web interface could be then treated as an optional addition, that user can download and run locally as an alternative to terminal-based interface.
|
- The organization of work was a challenge. Project requirements often required us to learn new technologies and solve nontrivial problems. Our team often got stuck on challenging problems and sometimes we had to change plans as some of our plans turned out to be technically impossible:
|
||||||
|
- we struggled with JWebAssembly and in the end switched to Spring
|
||||||
|
- the low-level C implementation was going too slow and we faced the risk of not delivering on time
|
||||||
|
- after the first semester we gained plenty of experience developing Java prototype and we noticed a galore of details that could be done better than we initially planned. We took a drastic decision to rewrite the compiler in Java, which was seen as risky.
|
||||||
|
|
||||||
|
Due to these and many other difficulties, our team could have failed on multiple occations.
|
||||||
|
|
||||||
|
- Our project is very niche and finding clients is not easy. If any of our clients lost interest in our solution, finding a new one might become impossible.
|
||||||
|
|
||||||
#### 10\. Milestones
|
#### 10\. Milestones
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user