This project focuses on research in the field of automata theory and inductive inference. While many existing libraries already provide support for general purpose automata and implement various related algorithms, this project takes a slightly different approach. The primary tool for working with the library, is through domain specific language of regular expressions. Most of the things can be done without writing even a single line of Java code. We provide commandline as well as web interface and integrated build system.
The main applications concern formal methods, natural language processing, state-based system modeling, pattern recognition, inductive inference and machine learning. It can be of great help for researchers as well as can be used on industrial scale.
The greatest competitor for Solomonoff is Google's OpenFST project. Solomonoff introduces dozens of improvements. The most important being efficiency improvements, symbolic ranges, nondeterminism detection and type system.
OpenFst was a niche project with sparse documentation and neglected user interface. Everything is done primarily in C++ with templates, while commandline interface is basic, regular expression language has many fundamental flaws, their build system consists of tool for generating Makefiles that only call the compiler. Solomonoff strives to bring improvement with the following features:
- We have online REPL with examples and guides where everything works out of the box. Everything can be done with our rich and carefully designed regular expression language, so there is no need to write code.
- We provide extensive documentation in form of 3 scientific papers, detailed technical documentation of compiler implementation, extensive GitHub page and interactive REPL tutorial.
- We ship with integrated build system that supports parallelism and interacts with compiler directly for optimum performance. It can be configured in TOML build files.
- There are plentiful of technical improvements, performance optimisations and innovative algorithms. All of them are described in detail in our documentation. We also provide performance benchamrks on our GitHub
The project found approval among members of Samsung's R&D team for Bixby developement and inductive inference researchers from Dortmund Technical University, Germany. Our system will be deployed in Samsung, our website can aid linguists with easily learning Solomonoff and our build system should make commandline usage more accessible.
Our project brings performance, which was measured with benchmarks (available on GitGub). We provide innovation, which can be easily noticed by the sheer amount of features Solomonoff implements, which are not available in any other existing tool (more information in scientific papers). We are user-friendly and we assesed it during usability tests and by collecting feedback from end users.
Currently there exists only one serious alternative, which is `OpenFST` library with their Thrax extension for writing regex-like grammars. Their solution has numerous problems. It's focus on probabilistic approach to modeling nondeterminism, made the library quite slow. It also became a double-edged sword, by making rule-based system difficult to maintain (compiler doesn't warn programmer when nondeterminism causes some rules to overshadow others). Compilation of grammars is lacking in many aspects. The grammar expression language is very basic and obscure. Compiler is not parallized and highly inefficient. On top of that, the probabilistic approach.
- Our solution completely gets rid of nondeterminism. We will not attempt to model any probabilistic models. We will focus more heavily on making the expression language user-friendly and helpful in detecting potential non-determinism. The compiler should be able to process multiple rules in parallel to make compilation time faster. We implement Glushkov's construction and take advantage of special properties of functional transducers. For better user experience and easier management of code, we provide integrated build system and online REPL with extensive documentation.
There used to be another similar project, developed by AT&T but it's been long discontinued and replaced by OpenFST. Aside from those two competitors, the market is very niche. There do not exist any other tools (or at least, they are not available publicly) strictly for compiling transducer. The closest other competitors might include general-purpose automata libraries like Google's `RE2` or Anders Møller's `BRICS` library. However, those projects are fundamentally different as they implement classical automata instead of tranasducers. As a result, Solomonoff is strictly more powerful and superior to those solutions. There is no other automata compiler with support for inductive inference. There is no ther symbolic transducer library (although there exist general-purpose libraries for symbolic automata, like `symbolicautomata`, `Microsoft Automata Library`, `Rex`, `Bex`, `Fast`, `Mona`).
A simple and efficient library written in C will be the main and primary component of our product. On top of that, it will have command-line interface equipped with compiler. For easy and quick access, we should support online repl for all curious people who want to give our library a try. The compiler should support parallelism, warn user about non-determinism and allow for possibly some extent of generic programming (by defining functions working on regular expressions or bulk-generation of rules according to some regularities). We should ,however pay extra attention, to not making this language turing complete/undecidable by accident (otherwise compilation might never end).
- Java compiler uses as few libraries as possible: ANTLR for parsing, LearnLib for inductive inference
- REPL is developed on top of compiler backend and is not included in the compiler itself (although compiler provides certain facilities necessary for implementing REPL)
- build system is developed and shipped independently from compiler backend
- online REPL with backend in Spring, uses compiler's Java API.
- the REPL used by build system and web brwoser has overlaping features but due to inherent differences between the two enviroments their implementations differ a little.
- website contains documentation, examples and tutorials
- assisting with compiler implementation (initial but discontinued C version)
Limitations:
- no official support for mobile devices. Website is responsive but no mobile-optimised version was added. End users should not code on their smartphones, which is not a common practice either way.
- no/limited compiler backend for embedded devices. While JavaME does exist, the compiler was not meant to be deployed in embedded systems.
- no (centralised) package manager for build-system. Developing package manager infrastructure is a prohibitively expensive operation.