diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..fc8a5de --- /dev/null +++ b/LICENSE @@ -0,0 +1,165 @@ + GNU LESSER GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + + This version of the GNU Lesser General Public License incorporates +the terms and conditions of version 3 of the GNU General Public +License, supplemented by the additional permissions listed below. + + 0. Additional Definitions. + + As used herein, "this License" refers to version 3 of the GNU Lesser +General Public License, and the "GNU GPL" refers to version 3 of the GNU +General Public License. + + "The Library" refers to a covered work governed by this License, +other than an Application or a Combined Work as defined below. + + An "Application" is any work that makes use of an interface provided +by the Library, but which is not otherwise based on the Library. +Defining a subclass of a class defined by the Library is deemed a mode +of using an interface provided by the Library. + + A "Combined Work" is a work produced by combining or linking an +Application with the Library. The particular version of the Library +with which the Combined Work was made is also called the "Linked +Version". + + The "Minimal Corresponding Source" for a Combined Work means the +Corresponding Source for the Combined Work, excluding any source code +for portions of the Combined Work that, considered in isolation, are +based on the Application, and not on the Linked Version. + + The "Corresponding Application Code" for a Combined Work means the +object code and/or source code for the Application, including any data +and utility programs needed for reproducing the Combined Work from the +Application, but excluding the System Libraries of the Combined Work. + + 1. Exception to Section 3 of the GNU GPL. + + You may convey a covered work under sections 3 and 4 of this License +without being bound by section 3 of the GNU GPL. + + 2. Conveying Modified Versions. + + If you modify a copy of the Library, and, in your modifications, a +facility refers to a function or data to be supplied by an Application +that uses the facility (other than as an argument passed when the +facility is invoked), then you may convey a copy of the modified +version: + + a) under this License, provided that you make a good faith effort to + ensure that, in the event an Application does not supply the + function or data, the facility still operates, and performs + whatever part of its purpose remains meaningful, or + + b) under the GNU GPL, with none of the additional permissions of + this License applicable to that copy. + + 3. Object Code Incorporating Material from Library Header Files. + + The object code form of an Application may incorporate material from +a header file that is part of the Library. You may convey such object +code under terms of your choice, provided that, if the incorporated +material is not limited to numerical parameters, data structure +layouts and accessors, or small macros, inline functions and templates +(ten or fewer lines in length), you do both of the following: + + a) Give prominent notice with each copy of the object code that the + Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the object code with a copy of the GNU GPL and this license + document. + + 4. Combined Works. + + You may convey a Combined Work under terms of your choice that, +taken together, effectively do not restrict modification of the +portions of the Library contained in the Combined Work and reverse +engineering for debugging such modifications, if you also do each of +the following: + + a) Give prominent notice with each copy of the Combined Work that + the Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the Combined Work with a copy of the GNU GPL and this license + document. + + c) For a Combined Work that displays copyright notices during + execution, include the copyright notice for the Library among + these notices, as well as a reference directing the user to the + copies of the GNU GPL and this license document. + + d) Do one of the following: + + 0) Convey the Minimal Corresponding Source under the terms of this + License, and the Corresponding Application Code in a form + suitable for, and under terms that permit, the user to + recombine or relink the Application with a modified version of + the Linked Version to produce a modified Combined Work, in the + manner specified by section 6 of the GNU GPL for conveying + Corresponding Source. + + 1) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (a) uses at run time + a copy of the Library already present on the user's computer + system, and (b) will operate properly with a modified version + of the Library that is interface-compatible with the Linked + Version. + + e) Provide Installation Information, but only if you would otherwise + be required to provide such information under section 6 of the + GNU GPL, and only to the extent that such information is + necessary to install and execute a modified version of the + Combined Work produced by recombining or relinking the + Application with a modified version of the Linked Version. (If + you use option 4d0, the Installation Information must accompany + the Minimal Corresponding Source and Corresponding Application + Code. If you use option 4d1, you must provide the Installation + Information in the manner specified by section 6 of the GNU GPL + for conveying Corresponding Source.) + + 5. Combined Libraries. + + You may place library facilities that are a work based on the +Library side by side in a single library together with other library +facilities that are not Applications and are not covered by this +License, and convey such a combined library under terms of your +choice, if you do both of the following: + + a) Accompany the combined library with a copy of the same work based + on the Library, uncombined with any other library facilities, + conveyed under the terms of this License. + + b) Give prominent notice with the combined library that part of it + is a work based on the Library, and explaining where to find the + accompanying uncombined form of the same work. + + 6. Revised Versions of the GNU Lesser General Public License. + + The Free Software Foundation may publish revised and/or new versions +of the GNU Lesser General Public License from time to time. Such new +versions will be similar in spirit to the present version, but may +differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the +Library as you received it specifies that a certain numbered version +of the GNU Lesser General Public License "or any later version" +applies to it, you have the option of following the terms and +conditions either of that published version or of any later version +published by the Free Software Foundation. If the Library as you +received it does not specify a version number of the GNU Lesser +General Public License, you may choose any version of the GNU Lesser +General Public License ever published by the Free Software Foundation. + + If the Library as you received it specifies that a proxy can decide +whether future versions of the GNU Lesser General Public License shall +apply, that proxy's public statement of acceptance of any version is +permanent authorization for you to choose that version for the +Library. diff --git a/MANIFEST.MF b/MANIFEST.MF new file mode 100644 index 0000000..262b5dc --- /dev/null +++ b/MANIFEST.MF @@ -0,0 +1,3 @@ +Manifest-Version: 1.0 +Main-Class: net.sourceforge.jFuzzyLogic.JFuzzyLogic + diff --git a/README.txt b/README.txt new file mode 100644 index 0000000..fd4cd21 --- /dev/null +++ b/README.txt @@ -0,0 +1,4 @@ + +Documentation + + http://jfuzzylogic.sourceforge.net diff --git a/README_release.txt b/README_release.txt new file mode 100644 index 0000000..9dd1640 --- /dev/null +++ b/README_release.txt @@ -0,0 +1,73 @@ + + + Release instructions + -------------------- + + +Main JAR file +------------- + + 1) Create jFuzzyLogic.jar file + + Eclipse -> Package explorer -> jFuzzyLogic -> Select file jFuzzyLogic.jardesc -> Right click "Create JAR" + + 2) Upload JAR file SourceForge (use sf.net menu) + + +HTML pages +---------- + + 1) Upload HTML pages to SourceForge + + cd ~/workspace/jFuzzyLogic + scp index.html pcingola,jfuzzylogic@frs.sourceforge.net:htdocs/ + + cd ~/workspace/jFuzzyLogic/html + scp *.{html,css} pcingola,jfuzzylogic@frs.sourceforge.net:htdocs/html + scp images/*.png pcingola,jfuzzylogic@frs.sourceforge.net:htdocs/html/images/ + scp videos/*.swf pcingola,jfuzzylogic@frs.sourceforge.net:htdocs/html/videos/ + scp -R assets dist fcl pdf pcingola,jfuzzylogic@frs.sourceforge.net:htdocs/html/ + +Eclipse plugin +-------------- + + 1) Create small jFuzzyLogic.jar file (it's better to use a small file and not the big JAR file that has all source files) + + cd ~/workspace/jFuzzyLogic/ + ant + + # Check the JAR file + cd + java -jar jFuzzyLogic.jar + + + 2) Copy jFuzzyLogic.jar file to UI project + + cp jFuzzyLogic.jar net.sourceforge.jFuzzyLogic.Fcl.ui/lib/jFuzzyLogic.jar + + 3) Build eclipse update site + + In Eclipse: + - In package explorer, refresh all net.sourceforge.jFuzzyLogic.Fcl.* projects + + - Open the net.sourceforge.jFuzzyLogic.Fcl.updateSite project + - Delete the contents of the 'plugins' 'features' and dir + + cd ~/workspace/net.sourceforge.jFuzzyLogic.Fcl.updateSite + rm -vf *.jar plugins/*.jar features/*.jar + + - Open site.xml file + - Go to "Site Map" tab + + - Open jFuzzyLogic category and remove the 'feature' (called something like "net.sourceforge.jFuzzyLogic.Fcl.sdk_1.1.0.201212101535.jar" + and add it again (just to be sure) + + - Click the "Buid All" button + + - Refresh the project (you should see the JAR files in the plugin folders now). + + 4) Upload Eclipse plugin files to SourceForge (Eclipse update site) + + cd ~/workspace/net.sourceforge.jFuzzyLogic.Fcl.updateSite + scp -r . pcingola,jfuzzylogic@frs.sourceforge.net:htdocs/eclipse/ + diff --git a/antlr_3_1_source/Tool.java b/antlr_3_1_source/Tool.java new file mode 100644 index 0000000..a2bbe5d --- /dev/null +++ b/antlr_3_1_source/Tool.java @@ -0,0 +1,659 @@ +/* + [The "BSD licence"] + Copyright (c) 2005-2008 Terence Parr + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +package org.antlr; + +import org.antlr.analysis.*; +import org.antlr.codegen.CodeGenerator; +import org.antlr.runtime.misc.Stats; +import org.antlr.tool.*; + +import java.io.*; +import java.util.*; + +/** The main ANTLR entry point. Read a grammar and generate a parser. */ +public class Tool { + public static final String VERSION = "3.1"; + + public static final String UNINITIALIZED_DIR = ""; + + // Input parameters / option + + protected List grammarFileNames = new ArrayList(); + protected boolean generate_NFA_dot = false; + protected boolean generate_DFA_dot = false; + protected String outputDirectory = UNINITIALIZED_DIR; + protected String libDirectory = "."; + protected boolean debug = false; + protected boolean trace = false; + protected boolean profile = false; + protected boolean report = false; + protected boolean printGrammar = false; + protected boolean depend = false; + protected boolean forceAllFilesToOutputDir = false; + protected boolean deleteTempLexer = true; + + // the internal options are for my use on the command line during dev + + public static boolean internalOption_PrintGrammarTree = false; + public static boolean internalOption_PrintDFA = false; + public static boolean internalOption_ShowNFAConfigsInDFA = false; + public static boolean internalOption_watchNFAConversion = false; + + public static void main(String[] args) { + ErrorManager.info("ANTLR Parser Generator Version " + + VERSION + " (August 12, 2008) 1989-2008"); + Tool antlr = new Tool(args); + antlr.process(); + if ( ErrorManager.getNumErrors() > 0 ) { + System.exit(1); + } + System.exit(0); + } + + public Tool() { + } + + public Tool(String[] args) { + processArgs(args); + } + + public void processArgs(String[] args) { + if ( args==null || args.length==0 ) { + help(); + return; + } + for (int i = 0; i < args.length; i++) { + if (args[i].equals("-o") || args[i].equals("-fo")) { + if (i + 1 >= args.length) { + System.err.println("missing output directory with -fo/-o option; ignoring"); + } + else { + if ( args[i].equals("-fo") ) { // force output into dir + forceAllFilesToOutputDir = true; + } + i++; + outputDirectory = args[i]; + if ( outputDirectory.endsWith("/") || + outputDirectory.endsWith("\\") ) + { + outputDirectory = + outputDirectory.substring(0,outputDirectory.length()-1); + } + File outDir = new File(outputDirectory); + if( outDir.exists() && !outDir.isDirectory() ) { + ErrorManager.error(ErrorManager.MSG_OUTPUT_DIR_IS_FILE,outputDirectory); + libDirectory = "."; + } + } + } + else if (args[i].equals("-lib")) { + if (i + 1 >= args.length) { + System.err.println("missing library directory with -lib option; ignoring"); + } + else { + i++; + libDirectory = args[i]; + if ( libDirectory.endsWith("/") || + libDirectory.endsWith("\\") ) + { + libDirectory = + libDirectory.substring(0,libDirectory.length()-1); + } + File outDir = new File(libDirectory); + if( !outDir.exists() ) { + ErrorManager.error(ErrorManager.MSG_DIR_NOT_FOUND,libDirectory); + libDirectory = "."; + } + } + } + else if (args[i].equals("-nfa")) { + generate_NFA_dot=true; + } + else if (args[i].equals("-dfa")) { + generate_DFA_dot=true; + } + else if (args[i].equals("-debug")) { + debug=true; + } + else if (args[i].equals("-trace")) { + trace=true; + } + else if (args[i].equals("-report")) { + report=true; + } + else if (args[i].equals("-profile")) { + profile=true; + } + else if (args[i].equals("-print")) { + printGrammar = true; + } + else if (args[i].equals("-depend")) { + depend=true; + } + else if (args[i].equals("-message-format")) { + if (i + 1 >= args.length) { + System.err.println("missing output format with -message-format option; using default"); + } + else { + i++; + ErrorManager.setFormat(args[i]); + } + } + else if (args[i].equals("-Xgrtree")) { + internalOption_PrintGrammarTree=true; // print grammar tree + } + else if (args[i].equals("-Xdfa")) { + internalOption_PrintDFA=true; + } + else if (args[i].equals("-Xnoprune")) { + DFAOptimizer.PRUNE_EBNF_EXIT_BRANCHES=false; + } + else if (args[i].equals("-Xnocollapse")) { + DFAOptimizer.COLLAPSE_ALL_PARALLEL_EDGES=false; + } + else if (args[i].equals("-Xdbgconversion")) { + NFAToDFAConverter.debug = true; + } + else if (args[i].equals("-Xmultithreaded")) { + NFAToDFAConverter.SINGLE_THREADED_NFA_CONVERSION = false; + } + else if (args[i].equals("-Xnomergestopstates")) { + DFAOptimizer.MERGE_STOP_STATES = false; + } + else if (args[i].equals("-Xdfaverbose")) { + internalOption_ShowNFAConfigsInDFA = true; + } + else if (args[i].equals("-Xwatchconversion")) { + internalOption_watchNFAConversion = true; + } + else if (args[i].equals("-XdbgST")) { + CodeGenerator.EMIT_TEMPLATE_DELIMITERS = true; + } + else if (args[i].equals("-Xmaxinlinedfastates")) { + if (i + 1 >= args.length) { + System.err.println("missing max inline dfa states -Xmaxinlinedfastates option; ignoring"); + } + else { + i++; + CodeGenerator.MAX_ACYCLIC_DFA_STATES_INLINE = Integer.parseInt(args[i]); + } + } + else if (args[i].equals("-Xm")) { + if (i + 1 >= args.length) { + System.err.println("missing max recursion with -Xm option; ignoring"); + } + else { + i++; + NFAContext.MAX_SAME_RULE_INVOCATIONS_PER_NFA_CONFIG_STACK = Integer.parseInt(args[i]); + } + } + else if (args[i].equals("-Xmaxdfaedges")) { + if (i + 1 >= args.length) { + System.err.println("missing max number of edges with -Xmaxdfaedges option; ignoring"); + } + else { + i++; + DFA.MAX_STATE_TRANSITIONS_FOR_TABLE = Integer.parseInt(args[i]); + } + } + else if (args[i].equals("-Xconversiontimeout")) { + if (i + 1 >= args.length) { + System.err.println("missing max time in ms -Xconversiontimeout option; ignoring"); + } + else { + i++; + DFA.MAX_TIME_PER_DFA_CREATION = Integer.parseInt(args[i]); + } + } + else if (args[i].equals("-Xnfastates")) { + DecisionProbe.verbose=true; + } + else if (args[i].equals("-X")) { + Xhelp(); + } + else { + if (args[i].charAt(0) != '-') { + // Must be the grammar file + grammarFileNames.add(args[i]); + } + } + } + } + + /* + protected void checkForInvalidArguments(String[] args, BitSet cmdLineArgValid) { + // check for invalid command line args + for (int a = 0; a < args.length; a++) { + if (!cmdLineArgValid.member(a)) { + System.err.println("invalid command-line argument: " + args[a] + "; ignored"); + } + } + } + */ + + public void process() { + int numFiles = grammarFileNames.size(); + boolean exceptionWhenWritingLexerFile = false; + String lexerGrammarFileName = null; // necessary at this scope to have access in the catch below + for (int i = 0; i < numFiles; i++) { + String grammarFileName = (String) grammarFileNames.get(i); + if ( numFiles > 1 && !depend ) { + System.out.println(grammarFileName); + } + try { + if ( depend ) { + BuildDependencyGenerator dep = + new BuildDependencyGenerator(this, grammarFileName); + List outputFiles = dep.getGeneratedFileList(); + List dependents = dep.getDependenciesFileList(); + //System.out.println("output: "+outputFiles); + //System.out.println("dependents: "+dependents); + System.out.println(dep.getDependencies()); + continue; + } + Grammar grammar = getRootGrammar(grammarFileName); + // we now have all grammars read in as ASTs + // (i.e., root and all delegates) + grammar.composite.assignTokenTypes(); + grammar.composite.defineGrammarSymbols(); + grammar.composite.createNFAs(); + + generateRecognizer(grammar); + + if ( printGrammar ) { + grammar.printGrammar(System.out); + } + + if ( report ) { + GrammarReport report = new GrammarReport(grammar); + System.out.println(report.toString()); + // print out a backtracking report too (that is not encoded into log) + System.out.println(report.getBacktrackingReport()); + // same for aborted NFA->DFA conversions + System.out.println(report.getAnalysisTimeoutReport()); + } + if ( profile ) { + GrammarReport report = new GrammarReport(grammar); + Stats.writeReport(GrammarReport.GRAMMAR_STATS_FILENAME, + report.toNotifyString()); + } + + // now handle the lexer if one was created for a merged spec + String lexerGrammarStr = grammar.getLexerGrammar(); + //System.out.println("lexer grammar:\n"+lexerGrammarStr); + if ( grammar.type==Grammar.COMBINED && lexerGrammarStr!=null ) { + lexerGrammarFileName = grammar.getImplicitlyGeneratedLexerFileName(); + try { + Writer w = getOutputFile(grammar,lexerGrammarFileName); + w.write(lexerGrammarStr); + w.close(); + } + catch (IOException e) { + // emit different error message when creating the implicit lexer fails + // due to write permission error + exceptionWhenWritingLexerFile = true; + throw e; + } + try { + StringReader sr = new StringReader(lexerGrammarStr); + Grammar lexerGrammar = new Grammar(); + lexerGrammar.composite.watchNFAConversion = internalOption_watchNFAConversion; + lexerGrammar.implicitLexer = true; + lexerGrammar.setTool(this); + File lexerGrammarFullFile = + new File(getFileDirectory(lexerGrammarFileName),lexerGrammarFileName); + lexerGrammar.setFileName(lexerGrammarFullFile.toString()); + + lexerGrammar.importTokenVocabulary(grammar); + lexerGrammar.parseAndBuildAST(sr); + + sr.close(); + + lexerGrammar.composite.assignTokenTypes(); + lexerGrammar.composite.defineGrammarSymbols(); + lexerGrammar.composite.createNFAs(); + + generateRecognizer(lexerGrammar); + } + finally { + // make sure we clean up + if ( deleteTempLexer ) { + File outputDir = getOutputDirectory(lexerGrammarFileName); + File outputFile = new File(outputDir, lexerGrammarFileName); + outputFile.delete(); + } + } + } + } + catch (IOException e) { + if (exceptionWhenWritingLexerFile) { + ErrorManager.error(ErrorManager.MSG_CANNOT_WRITE_FILE, + lexerGrammarFileName, e); + } else { + ErrorManager.error(ErrorManager.MSG_CANNOT_OPEN_FILE, + grammarFileName); + } + } + catch (Exception e) { + ErrorManager.error(ErrorManager.MSG_INTERNAL_ERROR, grammarFileName, e); + } + /* + finally { + System.out.println("creates="+ Interval.creates); + System.out.println("hits="+ Interval.hits); + System.out.println("misses="+ Interval.misses); + System.out.println("outOfRange="+ Interval.outOfRange); + } + */ + } + } + + /** Get a grammar mentioned on the command-line and any delegates */ + public Grammar getRootGrammar(String grammarFileName) + throws IOException + { + //StringTemplate.setLintMode(true); + // grammars mentioned on command line are either roots or single grammars. + // create the necessary composite in case it's got delegates; even + // single grammar needs it to get token types. + CompositeGrammar composite = new CompositeGrammar(); + Grammar grammar = new Grammar(this,grammarFileName,composite); + composite.setDelegationRoot(grammar); + FileReader fr = null; + fr = new FileReader(grammarFileName); + BufferedReader br = new BufferedReader(fr); + grammar.parseAndBuildAST(br); + composite.watchNFAConversion = internalOption_watchNFAConversion; + br.close(); + fr.close(); + return grammar; + } + + /** Create NFA, DFA and generate code for grammar. + * Create NFA for any delegates first. Once all NFA are created, + * it's ok to create DFA, which must check for left-recursion. That check + * is done by walking the full NFA, which therefore must be complete. + * After all NFA, comes DFA conversion for root grammar then code gen for + * root grammar. DFA and code gen for delegates comes next. + */ + protected void generateRecognizer(Grammar grammar) { + String language = (String)grammar.getOption("language"); + if ( language!=null ) { + CodeGenerator generator = new CodeGenerator(this, grammar, language); + grammar.setCodeGenerator(generator); + generator.setDebug(debug); + generator.setProfile(profile); + generator.setTrace(trace); + + // generate NFA early in case of crash later (for debugging) + if ( generate_NFA_dot ) { + generateNFAs(grammar); + } + + // GENERATE CODE + generator.genRecognizer(); + + if ( generate_DFA_dot ) { + generateDFAs(grammar); + } + + List delegates = grammar.getDirectDelegates(); + for (int i = 0; delegates!=null && i < delegates.size(); i++) { + Grammar delegate = (Grammar)delegates.get(i); + if ( delegate!=grammar ) { // already processing this one + generateRecognizer(delegate); + } + } + } + } + + public void generateDFAs(Grammar g) { + for (int d=1; d<=g.getNumberOfDecisions(); d++) { + DFA dfa = g.getLookaheadDFA(d); + if ( dfa==null ) { + continue; // not there for some reason, ignore + } + DOTGenerator dotGenerator = new DOTGenerator(g); + String dot = dotGenerator.getDOT( dfa.startState ); + String dotFileName = g.name+"."+"dec-"+d; + if ( g.implicitLexer ) { + dotFileName = g.name+Grammar.grammarTypeToFileNameSuffix[g.type]+"."+"dec-"+d; + } + try { + writeDOTFile(g, dotFileName, dot); + } + catch(IOException ioe) { + ErrorManager.error(ErrorManager.MSG_CANNOT_GEN_DOT_FILE, + dotFileName, + ioe); + } + } + } + + protected void generateNFAs(Grammar g) { + DOTGenerator dotGenerator = new DOTGenerator(g); + Collection rules = g.getAllImportedRules(); + rules.addAll(g.getRules()); + + for (Iterator itr = rules.iterator(); itr.hasNext();) { + Rule r = (Rule) itr.next(); + try { + String dot = dotGenerator.getDOT(r.startState); + if ( dot!=null ) { + writeDOTFile(g, r, dot); + } + } + catch (IOException ioe) { + ErrorManager.error(ErrorManager.MSG_CANNOT_WRITE_FILE, ioe); + } + } + } + + protected void writeDOTFile(Grammar g, Rule r, String dot) throws IOException { + writeDOTFile(g, r.grammar.name+"."+r.name, dot); + } + + protected void writeDOTFile(Grammar g, String name, String dot) throws IOException { + Writer fw = getOutputFile(g, name+".dot"); + fw.write(dot); + fw.close(); + } + + private static void help() { + System.err.println("usage: java org.antlr.Tool [args] file.g [file2.g file3.g ...]"); + System.err.println(" -o outputDir specify output directory where all output is generated"); + System.err.println(" -fo outputDir same as -o but force even files with relative paths to dir"); + System.err.println(" -lib dir specify location of token files"); + System.err.println(" -depend generate file dependencies"); + System.err.println(" -report print out a report about the grammar(s) processed"); + System.err.println(" -print print out the grammar without actions"); + System.err.println(" -debug generate a parser that emits debugging events"); + System.err.println(" -profile generate a parser that computes profiling information"); + System.err.println(" -nfa generate an NFA for each rule"); + System.err.println(" -dfa generate a DFA for each decision point"); + System.err.println(" -message-format name specify output style for messages"); + System.err.println(" -X display extended argument list"); + } + + private static void Xhelp() { + System.err.println(" -Xgrtree print the grammar AST"); + System.err.println(" -Xdfa print DFA as text "); + System.err.println(" -Xnoprune test lookahead against EBNF block exit branches"); + System.err.println(" -Xnocollapse collapse incident edges into DFA states"); + System.err.println(" -Xdbgconversion dump lots of info during NFA conversion"); + System.err.println(" -Xmultithreaded run the analysis in 2 threads"); + System.err.println(" -Xnomergestopstates do not merge stop states"); + System.err.println(" -Xdfaverbose generate DFA states in DOT with NFA configs"); + System.err.println(" -Xwatchconversion print a message for each NFA before converting"); + System.err.println(" -XdbgST put tags at start/stop of all templates in output"); + System.err.println(" -Xm m max number of rule invocations during conversion"); + System.err.println(" -Xmaxdfaedges m max \"comfortable\" number of edges for single DFA state"); + System.err.println(" -Xconversiontimeout t set NFA conversion timeout for each decision"); + System.err.println(" -Xmaxinlinedfastates m max DFA states before table used rather than inlining"); + System.err.println(" -Xnfastates for nondeterminisms, list NFA states for each path"); + } + + public void setOutputDirectory(String outputDirectory) { + this.outputDirectory = outputDirectory; + } + + /** This method is used by all code generators to create new output + * files. If the outputDir set by -o is not present it will be created. + * The final filename is sensitive to the output directory and + * the directory where the grammar file was found. If -o is /tmp + * and the original grammar file was foo/t.g then output files + * go in /tmp/foo. + * + * The output dir -o spec takes precedence if it's absolute. + * E.g., if the grammar file dir is absolute the output dir is given + * precendence. "-o /tmp /usr/lib/t.g" results in "/tmp/T.java" as + * output (assuming t.g holds T.java). + * + * If no -o is specified, then just write to the directory where the + * grammar file was found. + * + * If outputDirectory==null then write a String. + */ + public Writer getOutputFile(Grammar g, String fileName) throws IOException { + if ( outputDirectory==null ) { + return new StringWriter(); + } + // output directory is a function of where the grammar file lives + // for subdir/T.g, you get subdir here. Well, depends on -o etc... + File outputDir = getOutputDirectory(g.getFileName()); + File outputFile = new File(outputDir, fileName); + + if( !outputDir.exists() ) { + outputDir.mkdirs(); + } + FileWriter fw = new FileWriter(outputFile); + return new BufferedWriter(fw); + } + + public File getOutputDirectory(String fileNameWithPath) { + File outputDir = new File(outputDirectory); + String fileDirectory = getFileDirectory(fileNameWithPath); + if ( outputDirectory!=UNINITIALIZED_DIR ) { + // -o /tmp /var/lib/t.g => /tmp/T.java + // -o subdir/output /usr/lib/t.g => subdir/output/T.java + // -o . /usr/lib/t.g => ./T.java + if ( fileDirectory!=null && + (new File(fileDirectory).isAbsolute() || + fileDirectory.startsWith("~")) || // isAbsolute doesn't count this :( + forceAllFilesToOutputDir + ) + { + // somebody set the dir, it takes precendence; write new file there + outputDir = new File(outputDirectory); + } + else { + // -o /tmp subdir/t.g => /tmp/subdir/t.g + if ( fileDirectory!=null ) { + outputDir = new File(outputDirectory, fileDirectory); + } + else { + outputDir = new File(outputDirectory); + } + } + } + else { + // they didn't specify a -o dir so just write to location + // where grammar is, absolute or relative + String dir = "."; + if ( fileDirectory!=null ) { + dir = fileDirectory; + } + outputDir = new File(dir); + } + return outputDir; + } + + /** Name a file in the -lib dir. Imported grammars and .tokens files */ + public String getLibraryFile(String fileName) throws IOException { + return libDirectory+File.separator+fileName; + } + + public String getLibraryDirectory() { + return libDirectory; + } + + /** Return the directory containing the grammar file for this grammar. + * normally this is a relative path from current directory. People will + * often do "java org.antlr.Tool grammars/*.g3" So the file will be + * "grammars/foo.g3" etc... This method returns "grammars". + */ + public String getFileDirectory(String fileName) { + File f = new File(fileName); + return f.getParent(); + } + + /** Return a File descriptor for vocab file. Look in library or + * in -o output path. antlr -o foo T.g U.g where U needs T.tokens + * won't work unless we look in foo too. + */ + public File getImportedVocabFile(String vocabName) { + File f = new File(getLibraryDirectory(), + File.separator+ + vocabName+ + CodeGenerator.VOCAB_FILE_EXTENSION); + if ( f.exists() ) { + return f; + } + + return new File(outputDirectory+ + File.separator+ + vocabName+ + CodeGenerator.VOCAB_FILE_EXTENSION); + } + + /** If the tool needs to panic/exit, how do we do that? */ + public void panic() { + throw new Error("ANTLR panic"); + } + + /** Return a time stamp string accurate to sec: yyyy-mm-dd hh:mm:ss */ + public static String getCurrentTimeStamp() { + GregorianCalendar calendar = new java.util.GregorianCalendar(); + int y = calendar.get(Calendar.YEAR); + int m = calendar.get(Calendar.MONTH)+1; // zero-based for months + int d = calendar.get(Calendar.DAY_OF_MONTH); + int h = calendar.get(Calendar.HOUR_OF_DAY); + int min = calendar.get(Calendar.MINUTE); + int sec = calendar.get(Calendar.SECOND); + String sy = String.valueOf(y); + String sm = m<10?"0"+m:String.valueOf(m); + String sd = d<10?"0"+d:String.valueOf(d); + String sh = h<10?"0"+h:String.valueOf(h); + String smin = min<10?"0"+min:String.valueOf(min); + String ssec = sec<10?"0"+sec:String.valueOf(sec); + return new StringBuffer().append(sy).append("-").append(sm).append("-") + .append(sd).append(" ").append(sh).append(":").append(smin) + .append(":").append(ssec).toString(); + } + +} diff --git a/antlr_3_1_source/analysis/ActionLabel.java b/antlr_3_1_source/analysis/ActionLabel.java new file mode 100644 index 0000000..1265364 --- /dev/null +++ b/antlr_3_1_source/analysis/ActionLabel.java @@ -0,0 +1,56 @@ +/* + [The "BSD licence"] + Copyright (c) 2005-2008 Terence Parr + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +package org.antlr.analysis; + +import org.antlr.tool.GrammarAST; +import org.antlr.tool.Grammar; + +public class ActionLabel extends Label { + public GrammarAST actionAST; + + public ActionLabel(GrammarAST actionAST) { + super(ACTION); + this.actionAST = actionAST; + } + + public boolean isEpsilon() { + return true; // we are to be ignored by analysis 'cept for predicates + } + + public boolean isAction() { + return true; + } + + public String toString() { + return "{"+actionAST+"}"; + } + + public String toString(Grammar g) { + return toString(); + } +} diff --git a/antlr_3_1_source/analysis/AnalysisRecursionOverflowException.java b/antlr_3_1_source/analysis/AnalysisRecursionOverflowException.java new file mode 100644 index 0000000..6403ea9 --- /dev/null +++ b/antlr_3_1_source/analysis/AnalysisRecursionOverflowException.java @@ -0,0 +1,40 @@ +/* + [The "BSD licence"] + Copyright (c) 2005-2008 Terence Parr + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +package org.antlr.analysis; + +/** An NFA configuration context stack overflowed. */ +public class AnalysisRecursionOverflowException extends RuntimeException { + public DFAState ovfState; + public NFAConfiguration proposedNFAConfiguration; + public AnalysisRecursionOverflowException(DFAState ovfState, + NFAConfiguration proposedNFAConfiguration) + { + this.ovfState = ovfState; + this.proposedNFAConfiguration = proposedNFAConfiguration; + } +} diff --git a/antlr_3_1_source/analysis/AnalysisTimeoutException.java b/antlr_3_1_source/analysis/AnalysisTimeoutException.java new file mode 100644 index 0000000..392b316 --- /dev/null +++ b/antlr_3_1_source/analysis/AnalysisTimeoutException.java @@ -0,0 +1,36 @@ +/* + [The "BSD licence"] + Copyright (c) 2005-2008 Terence Parr + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +package org.antlr.analysis; + +/** Analysis took too long; bail out of entire DFA construction. */ +public class AnalysisTimeoutException extends RuntimeException { + public DFA abortedDFA; + public AnalysisTimeoutException(DFA abortedDFA) { + this.abortedDFA = abortedDFA; + } +} diff --git a/antlr_3_1_source/analysis/DFA.java b/antlr_3_1_source/analysis/DFA.java new file mode 100644 index 0000000..e69b99e --- /dev/null +++ b/antlr_3_1_source/analysis/DFA.java @@ -0,0 +1,1061 @@ +/* + [The "BSD licence"] + Copyright (c) 2005-2006 Terence Parr + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +package org.antlr.analysis; + +import org.antlr.codegen.CodeGenerator; +import org.antlr.misc.IntSet; +import org.antlr.misc.IntervalSet; +import org.antlr.misc.Utils; +import org.antlr.runtime.IntStream; +import org.antlr.stringtemplate.StringTemplate; +import org.antlr.tool.*; + +import java.util.*; + +/** A DFA (converted from a grammar's NFA). + * DFAs are used as prediction machine for alternative blocks in all kinds + * of recognizers (lexers, parsers, tree walkers). + */ +public class DFA { + public static final int REACHABLE_UNKNOWN = -2; + public static final int REACHABLE_BUSY = -1; // in process of computing + public static final int REACHABLE_NO = 0; + public static final int REACHABLE_YES = 1; + + /** Prevent explosion of DFA states during conversion. The max number + * of states per alt in a single decision's DFA. + public static final int MAX_STATES_PER_ALT_IN_DFA = 450; + */ + + /** Set to 0 to not terminate early (time in ms) */ + public static int MAX_TIME_PER_DFA_CREATION = 1*1000; + + /** How many edges can each DFA state have before a "special" state + * is created that uses IF expressions instead of a table? + */ + public static int MAX_STATE_TRANSITIONS_FOR_TABLE = 65534; + + /** What's the start state for this DFA? */ + public DFAState startState; + + /** This DFA is being built for which decision? */ + public int decisionNumber = 0; + + /** From what NFAState did we create the DFA? */ + public NFAState decisionNFAStartState; + + /** The printable grammar fragment associated with this DFA */ + public String description; + + /** A set of all uniquely-numbered DFA states. Maps hash of DFAState + * to the actual DFAState object. We use this to detect + * existing DFA states. Map. Use Map so + * we can get old state back (Set only allows you to see if it's there). + * Not used during fixed k lookahead as it's a waste to fill it with + * a dup of states array. + */ + protected Map uniqueStates = new HashMap(); + + /** Maps the state number to the actual DFAState. Use a Vector as it + * grows automatically when I set the ith element. This contains all + * states, but the states are not unique. s3 might be same as s1 so + * s3 -> s1 in this table. This is how cycles occur. If fixed k, + * then these states will all be unique as states[i] always points + * at state i when no cycles exist. + * + * This is managed in parallel with uniqueStates and simply provides + * a way to go from state number to DFAState rather than via a + * hash lookup. + */ + protected Vector states = new Vector(); + + /** Unique state numbers per DFA */ + protected int stateCounter = 0; + + /** count only new states not states that were rejected as already present */ + protected int numberOfStates = 0; + + /** User specified max fixed lookahead. If 0, nothing specified. -1 + * implies we have not looked at the options table yet to set k. + */ + protected int user_k = -1; + + /** While building the DFA, track max lookahead depth if not cyclic */ + protected int max_k = -1; + + /** Is this DFA reduced? I.e., can all states lead to an accept state? */ + protected boolean reduced = true; + + /** Are there any loops in this DFA? + * Computed by doesStateReachAcceptState() + */ + protected boolean cyclic = false; + + /** Track whether this DFA has at least one sem/syn pred encountered + * during a closure operation. This is useful for deciding whether + * to retry a non-LL(*) with k=1. If no pred, it will not work w/o + * a pred so don't bother. It would just give another error message. + */ + public boolean predicateVisible = false; + + public boolean hasPredicateBlockedByAction = false; + + /** Each alt in an NFA derived from a grammar must have a DFA state that + * predicts it lest the parser not know what to do. Nondeterminisms can + * lead to this situation (assuming no semantic predicates can resolve + * the problem) and when for some reason, I cannot compute the lookahead + * (which might arise from an error in the algorithm or from + * left-recursion etc...). This list starts out with all alts contained + * and then in method doesStateReachAcceptState() I remove the alts I + * know to be uniquely predicted. + */ + protected List unreachableAlts; + + protected int nAlts = 0; + + /** We only want one accept state per predicted alt; track here */ + protected DFAState[] altToAcceptState; + + /** Track whether an alt discovers recursion for each alt during + * NFA to DFA conversion; >1 alt with recursion implies nonregular. + */ + public IntSet recursiveAltSet = new IntervalSet(); + + /** Which NFA are we converting (well, which piece of the NFA)? */ + public NFA nfa; + + protected NFAToDFAConverter nfaConverter; + + /** This probe tells you a lot about a decision and is useful even + * when there is no error such as when a syntactic nondeterminism + * is solved via semantic predicates. Perhaps a GUI would want + * the ability to show that. + */ + public DecisionProbe probe = new DecisionProbe(this); + + /** Track absolute time of the conversion so we can have a failsafe: + * if it takes too long, then terminate. Assume bugs are in the + * analysis engine. + */ + protected long conversionStartTime; + + /** Map an edge transition table to a unique set number; ordered so + * we can push into the output template as an ordered list of sets + * and then ref them from within the transition[][] table. Like this + * for C# target: + * public static readonly DFA30_transition0 = + * new short[] { 46, 46, -1, 46, 46, -1, -1, -1, -1, -1, -1, -1,...}; + * public static readonly DFA30_transition1 = + * new short[] { 21 }; + * public static readonly short[][] DFA30_transition = { + * DFA30_transition0, + * DFA30_transition0, + * DFA30_transition1, + * ... + * }; + */ + public Map edgeTransitionClassMap = new LinkedHashMap(); + + /** The unique edge transition class number; every time we see a new + * set of edges emanating from a state, we number it so we can reuse + * if it's every seen again for another state. For Java grammar, + * some of the big edge transition tables are seen about 57 times. + */ + protected int edgeTransitionClass =0; + + /* This DFA can be converted to a transition[state][char] table and + * the following tables are filled by createStateTables upon request. + * These are injected into the templates for code generation. + * See March 25, 2006 entry for description: + * http://www.antlr.org/blog/antlr3/codegen.tml + * Often using Vector as can't set ith position in a List and have + * it extend list size; bizarre. + */ + + /** List of special DFAState objects */ + public List specialStates; + /** List of ST for special states. */ + public List specialStateSTs; + public Vector accept; + public Vector eot; + public Vector eof; + public Vector min; + public Vector max; + public Vector special; + public Vector transition; + /** just the Vector indicating which unique edge table is at + * position i. + */ + public Vector transitionEdgeTables; // not used by java yet + protected int uniqueCompressedSpecialStateNum = 0; + + /** Which generator to use if we're building state tables */ + protected CodeGenerator generator = null; + + protected DFA() {;} + + public DFA(int decisionNumber, NFAState decisionStartState) { + this.decisionNumber = decisionNumber; + this.decisionNFAStartState = decisionStartState; + nfa = decisionStartState.nfa; + nAlts = nfa.grammar.getNumberOfAltsForDecisionNFA(decisionStartState); + //setOptions( nfa.grammar.getDecisionOptions(getDecisionNumber()) ); + initAltRelatedInfo(); + + //long start = System.currentTimeMillis(); + nfaConverter = new NFAToDFAConverter(this); + try { + nfaConverter.convert(); + + // figure out if there are problems with decision + verify(); + + if ( !probe.isDeterministic() || probe.analysisOverflowed() ) { + probe.issueWarnings(); + } + + // must be after verify as it computes cyclic, needed by this routine + // should be after warnings because early termination or something + // will not allow the reset to operate properly in some cases. + resetStateNumbersToBeContiguous(); + + //long stop = System.currentTimeMillis(); + //System.out.println("verify cost: "+(int)(stop-start)+" ms"); + } + catch (AnalysisTimeoutException at) { + probe.reportAnalysisTimeout(); + if ( !okToRetryDFAWithK1() ) { + probe.issueWarnings(); + } + } + catch (NonLLStarDecisionException nonLL) { + probe.reportNonLLStarDecision(this); + // >1 alt recurses, k=* and no auto backtrack nor manual sem/syn + if ( !okToRetryDFAWithK1() ) { + probe.issueWarnings(); + } + } + } + + /** Walk all states and reset their numbers to be a contiguous sequence + * of integers starting from 0. Only cyclic DFA can have unused positions + * in states list. State i might be identical to a previous state j and + * will result in states[i] == states[j]. We don't want to waste a state + * number on this. Useful mostly for code generation in tables. + * + * At the start of this routine, states[i].stateNumber <= i by definition. + * If states[50].stateNumber is 50 then a cycle during conversion may + * try to add state 103, but we find that an identical DFA state, named + * 50, already exists, hence, states[103]==states[50] and both have + * stateNumber 50 as they point at same object. Afterwards, the set + * of state numbers from all states should represent a contiguous range + * from 0..n-1 where n is the number of unique states. + */ + public void resetStateNumbersToBeContiguous() { + if ( getUserMaxLookahead()>0 ) { + // all numbers are unique already; no states are thrown out. + return; + } + + // walk list of DFAState objects by state number, + // setting state numbers to 0..n-1 + int snum=0; + for (int i = 0; i <= getMaxStateNumber(); i++) { + DFAState s = getState(i); + // some states are unused after creation most commonly due to cycles + // or conflict resolution. + if ( s==null ) { + continue; + } + // state i is mapped to DFAState with state number set to i originally + // so if it's less than i, then we renumbered it already; that + // happens when states have been merged or cycles occurred I think. + // states[50] will point to DFAState with s50 in it but + // states[103] might also point at this same DFAState. Since + // 50 < 103 then it's already been renumbered as it points downwards. + boolean alreadyRenumbered = s.stateNumber> which is the transition[][] table + for (int i = 0; i < transition.size(); i++) { + Vector transitionsForState = (Vector) transition.elementAt(i); + encoded.add(getRunLengthEncoding(transitionsForState)); + } + return encoded; + } + + /** Compress the incoming data list so that runs of same number are + * encoded as number,value pair sequences. 3 -1 -1 -1 28 is encoded + * as 1 3 3 -1 1 28. I am pretty sure this is the lossless compression + * that GIF files use. Transition tables are heavily compressed by + * this technique. I got the idea from JFlex http://jflex.de/ + * + * Return List where each string is either \xyz for 8bit char + * and \uFFFF for 16bit. Hideous and specific to Java, but it is the + * only target bad enough to need it. + */ + public List getRunLengthEncoding(List data) { + if ( data==null || data.size()==0 ) { + // for states with no transitions we want an empty string "" + // to hold its place in the transitions array. + List empty = new ArrayList(); + empty.add(""); + return empty; + } + int size = Math.max(2,data.size()/2); + List encoded = new ArrayList(size); // guess at size + // scan values looking for runs + int i = 0; + Integer emptyValue = Utils.integer(-1); + while ( i < data.size() ) { + Integer I = (Integer)data.get(i); + if ( I==null ) { + I = emptyValue; + } + // count how many v there are? + int n = 0; + for (int j = i; j < data.size(); j++) { + Integer v = (Integer)data.get(j); + if ( v==null ) { + v = emptyValue; + } + if ( I.equals(v) ) { + n++; + } + else { + break; + } + } + encoded.add(generator.target.encodeIntAsCharEscape((char)n)); + encoded.add(generator.target.encodeIntAsCharEscape((char)I.intValue())); + i+=n; + } + return encoded; + } + + public void createStateTables(CodeGenerator generator) { + //System.out.println("createTables:\n"+this); + this.generator = generator; + description = getNFADecisionStartState().getDescription(); + description = + generator.target.getTargetStringLiteralFromString(description); + + // create all the tables + special = new Vector(this.getNumberOfStates()); // Vector + special.setSize(this.getNumberOfStates()); + specialStates = new ArrayList(); // List + specialStateSTs = new ArrayList(); // List + accept = new Vector(this.getNumberOfStates()); // Vector + accept.setSize(this.getNumberOfStates()); + eot = new Vector(this.getNumberOfStates()); // Vector + eot.setSize(this.getNumberOfStates()); + eof = new Vector(this.getNumberOfStates()); // Vector + eof.setSize(this.getNumberOfStates()); + min = new Vector(this.getNumberOfStates()); // Vector + min.setSize(this.getNumberOfStates()); + max = new Vector(this.getNumberOfStates()); // Vector + max.setSize(this.getNumberOfStates()); + transition = new Vector(this.getNumberOfStates()); // Vector> + transition.setSize(this.getNumberOfStates()); + transitionEdgeTables = new Vector(this.getNumberOfStates()); // Vector> + transitionEdgeTables.setSize(this.getNumberOfStates()); + + // for each state in the DFA, fill relevant tables. + Iterator it = null; + if ( getUserMaxLookahead()>0 ) { + it = states.iterator(); + } + else { + it = getUniqueStates().values().iterator(); + } + while ( it.hasNext() ) { + DFAState s = (DFAState)it.next(); + if ( s==null ) { + // ignore null states; some acylic DFA see this condition + // when inlining DFA (due to lacking of exit branch pruning?) + continue; + } + if ( s.isAcceptState() ) { + // can't compute min,max,special,transition on accepts + accept.set(s.stateNumber, + Utils.integer(s.getUniquelyPredictedAlt())); + } + else { + createMinMaxTables(s); + createTransitionTableEntryForState(s); + createSpecialTable(s); + createEOTAndEOFTables(s); + } + } + + // now that we have computed list of specialStates, gen code for 'em + for (int i = 0; i < specialStates.size(); i++) { + DFAState ss = (DFAState) specialStates.get(i); + StringTemplate stateST = + generator.generateSpecialState(ss); + specialStateSTs.add(stateST); + } + + // check that the tables are not messed up by encode/decode + /* + testEncodeDecode(min); + testEncodeDecode(max); + testEncodeDecode(accept); + testEncodeDecode(special); + System.out.println("min="+min); + System.out.println("max="+max); + System.out.println("eot="+eot); + System.out.println("eof="+eof); + System.out.println("accept="+accept); + System.out.println("special="+special); + System.out.println("transition="+transition); + */ + } + + /* + private void testEncodeDecode(List data) { + System.out.println("data="+data); + List encoded = getRunLengthEncoding(data); + StringBuffer buf = new StringBuffer(); + for (int i = 0; i < encoded.size(); i++) { + String I = (String)encoded.get(i); + int v = 0; + if ( I.startsWith("\\u") ) { + v = Integer.parseInt(I.substring(2,I.length()), 16); + } + else { + v = Integer.parseInt(I.substring(1,I.length()), 8); + } + buf.append((char)v); + } + String encodedS = buf.toString(); + short[] decoded = org.antlr.runtime.DFA.unpackEncodedString(encodedS); + //System.out.println("decoded:"); + for (int i = 0; i < decoded.length; i++) { + short x = decoded[i]; + if ( x!=((Integer)data.get(i)).intValue() ) { + System.err.println("problem with encoding"); + } + //System.out.print(", "+x); + } + //System.out.println(); + } + */ + + protected void createMinMaxTables(DFAState s) { + int smin = Label.MAX_CHAR_VALUE + 1; + int smax = Label.MIN_ATOM_VALUE - 1; + for (int j = 0; j < s.getNumberOfTransitions(); j++) { + Transition edge = (Transition) s.transition(j); + Label label = edge.label; + if ( label.isAtom() ) { + if ( label.getAtom()>=Label.MIN_CHAR_VALUE ) { + if ( label.getAtom()smax ) { + smax = label.getAtom(); + } + } + } + else if ( label.isSet() ) { + IntervalSet labels = (IntervalSet)label.getSet(); + int lmin = labels.getMinElement(); + // if valid char (don't do EOF) and less than current min + if ( lmin=Label.MIN_CHAR_VALUE ) { + smin = labels.getMinElement(); + } + if ( labels.getMaxElement()>smax ) { + smax = labels.getMaxElement(); + } + } + } + + if ( smax<0 ) { + // must be predicates or pure EOT transition; just zero out min, max + smin = Label.MIN_CHAR_VALUE; + smax = Label.MIN_CHAR_VALUE; + } + + min.set(s.stateNumber, Utils.integer((char)smin)); + max.set(s.stateNumber, Utils.integer((char)smax)); + + if ( smax<0 || smin>Label.MAX_CHAR_VALUE || smin<0 ) { + ErrorManager.internalError("messed up: min="+min+", max="+max); + } + } + + protected void createTransitionTableEntryForState(DFAState s) { + /* + System.out.println("createTransitionTableEntryForState s"+s.stateNumber+ + " dec "+s.dfa.decisionNumber+" cyclic="+s.dfa.isCyclic()); + */ + int smax = ((Integer)max.get(s.stateNumber)).intValue(); + int smin = ((Integer)min.get(s.stateNumber)).intValue(); + + Vector stateTransitions = new Vector(smax-smin+1); + stateTransitions.setSize(smax-smin+1); + transition.set(s.stateNumber, stateTransitions); + for (int j = 0; j < s.getNumberOfTransitions(); j++) { + Transition edge = (Transition) s.transition(j); + Label label = edge.label; + if ( label.isAtom() && label.getAtom()>=Label.MIN_CHAR_VALUE ) { + int labelIndex = label.getAtom()-smin; // offset from 0 + stateTransitions.set(labelIndex, + Utils.integer(edge.target.stateNumber)); + } + else if ( label.isSet() ) { + IntervalSet labels = (IntervalSet)label.getSet(); + int[] atoms = labels.toArray(); + for (int a = 0; a < atoms.length; a++) { + // set the transition if the label is valid (don't do EOF) + if ( atoms[a]>=Label.MIN_CHAR_VALUE ) { + int labelIndex = atoms[a]-smin; // offset from 0 + stateTransitions.set(labelIndex, + Utils.integer(edge.target.stateNumber)); + } + } + } + } + // track unique state transition tables so we can reuse + Integer edgeClass = (Integer)edgeTransitionClassMap.get(stateTransitions); + if ( edgeClass!=null ) { + //System.out.println("we've seen this array before; size="+stateTransitions.size()); + transitionEdgeTables.set(s.stateNumber, edgeClass); + } + else { + edgeClass = Utils.integer(edgeTransitionClass); + transitionEdgeTables.set(s.stateNumber, edgeClass); + edgeTransitionClassMap.put(stateTransitions, edgeClass); + edgeTransitionClass++; + } + } + + /** Set up the EOT and EOF tables; we cannot put -1 min/max values so + * we need another way to test that in the DFA transition function. + */ + protected void createEOTAndEOFTables(DFAState s) { + for (int j = 0; j < s.getNumberOfTransitions(); j++) { + Transition edge = (Transition) s.transition(j); + Label label = edge.label; + if ( label.isAtom() ) { + if ( label.getAtom()==Label.EOT ) { + // eot[s] points to accept state + eot.set(s.stateNumber, Utils.integer(edge.target.stateNumber)); + } + else if ( label.getAtom()==Label.EOF ) { + // eof[s] points to accept state + eof.set(s.stateNumber, Utils.integer(edge.target.stateNumber)); + } + } + else if ( label.isSet() ) { + IntervalSet labels = (IntervalSet)label.getSet(); + int[] atoms = labels.toArray(); + for (int a = 0; a < atoms.length; a++) { + if ( atoms[a]==Label.EOT ) { + // eot[s] points to accept state + eot.set(s.stateNumber, Utils.integer(edge.target.stateNumber)); + } + else if ( atoms[a]==Label.EOF ) { + eof.set(s.stateNumber, Utils.integer(edge.target.stateNumber)); + } + } + } + } + } + + protected void createSpecialTable(DFAState s) { + // number all special states from 0...n-1 instead of their usual numbers + boolean hasSemPred = false; + + // TODO this code is very similar to canGenerateSwitch. Refactor to share + for (int j = 0; j < s.getNumberOfTransitions(); j++) { + Transition edge = (Transition) s.transition(j); + Label label = edge.label; + // can't do a switch if the edges have preds or are going to + // require gated predicates + if ( label.isSemanticPredicate() || + ((DFAState)edge.target).getGatedPredicatesInNFAConfigurations()!=null) + { + hasSemPred = true; + break; + } + } + // if has pred or too big for table, make it special + int smax = ((Integer)max.get(s.stateNumber)).intValue(); + int smin = ((Integer)min.get(s.stateNumber)).intValue(); + if ( hasSemPred || smax-smin>MAX_STATE_TRANSITIONS_FOR_TABLE ) { + special.set(s.stateNumber, + Utils.integer(uniqueCompressedSpecialStateNum)); + uniqueCompressedSpecialStateNum++; + specialStates.add(s); + } + else { + special.set(s.stateNumber, Utils.integer(-1)); // not special + } + } + + public int predict(IntStream input) { + Interpreter interp = new Interpreter(nfa.grammar, input); + return interp.predict(this); + } + + /** Add a new DFA state to this DFA if not already present. + * To force an acyclic, fixed maximum depth DFA, just always + * return the incoming state. By not reusing old states, + * no cycles can be created. If we're doing fixed k lookahead + * don't updated uniqueStates, just return incoming state, which + * indicates it's a new state. + */ + protected DFAState addState(DFAState d) { + if ( getUserMaxLookahead()>0 ) { + return d; + } + // does a DFA state exist already with everything the same + // except its state number? + DFAState existing = (DFAState)uniqueStates.get(d); + if ( existing != null ) { + /* + System.out.println("state "+d.stateNumber+" exists as state "+ + existing.stateNumber); + */ + // already there...get the existing DFA state + return existing; + } + + // if not there, then add new state. + uniqueStates.put(d,d); + numberOfStates++; + return d; + } + + public void removeState(DFAState d) { + DFAState it = (DFAState)uniqueStates.remove(d); + if ( it!=null ) { + numberOfStates--; + } + } + + public Map getUniqueStates() { + return uniqueStates; + } + + /** What is the max state number ever created? This may be beyond + * getNumberOfStates(). + */ + public int getMaxStateNumber() { + return states.size()-1; + } + + public DFAState getState(int stateNumber) { + return (DFAState)states.get(stateNumber); + } + + public void setState(int stateNumber, DFAState d) { + states.set(stateNumber, d); + } + + /** Is the DFA reduced? I.e., does every state have a path to an accept + * state? If not, don't delete as we need to generate an error indicating + * which paths are "dead ends". Also tracks list of alts with no accept + * state in the DFA. Must call verify() first before this makes sense. + */ + public boolean isReduced() { + return reduced; + } + + /** Is this DFA cyclic? That is, are there any loops? If not, then + * the DFA is essentially an LL(k) predictor for some fixed, max k value. + * We can build a series of nested IF statements to match this. In the + * presence of cycles, we need to build a general DFA and interpret it + * to distinguish between alternatives. + */ + public boolean isCyclic() { + return cyclic && getUserMaxLookahead()==0; + } + + public boolean canInlineDecision() { + return !isCyclic() && + !probe.isNonLLStarDecision() && + getNumberOfStates() < CodeGenerator.MAX_ACYCLIC_DFA_STATES_INLINE; + } + + /** Is this DFA derived from the NFA for the Tokens rule? */ + public boolean isTokensRuleDecision() { + if ( nfa.grammar.type!=Grammar.LEXER ) { + return false; + } + NFAState nfaStart = getNFADecisionStartState(); + Rule r = nfa.grammar.getLocallyDefinedRule(Grammar.ARTIFICIAL_TOKENS_RULENAME); + NFAState TokensRuleStart = r.startState; + NFAState TokensDecisionStart = + (NFAState)TokensRuleStart.transition[0].target; + return nfaStart == TokensDecisionStart; + } + + /** The user may specify a max, acyclic lookahead for any decision. No + * DFA cycles are created when this value, k, is greater than 0. + * If this decision has no k lookahead specified, then try the grammar. + */ + public int getUserMaxLookahead() { + if ( user_k>=0 ) { // cache for speed + return user_k; + } + user_k = nfa.grammar.getUserMaxLookahead(decisionNumber); + return user_k; + } + + public boolean getAutoBacktrackMode() { + return nfa.grammar.getAutoBacktrackMode(decisionNumber); + } + + public void setUserMaxLookahead(int k) { + this.user_k = k; + } + + /** Return k if decision is LL(k) for some k else return max int */ + public int getMaxLookaheadDepth() { + if ( isCyclic() ) { + return Integer.MAX_VALUE; + } + return max_k; + } + + /** Return a list of Integer alt numbers for which no lookahead could + * be computed or for which no single DFA accept state predicts those + * alts. Must call verify() first before this makes sense. + */ + public List getUnreachableAlts() { + return unreachableAlts; + } + + /** Once this DFA has been built, need to verify that: + * + * 1. it's reduced + * 2. all alts have an accept state + * + * Elsewhere, in the NFA converter, we need to verify that: + * + * 3. alts i and j have disjoint lookahead if no sem preds + * 4. if sem preds, nondeterministic alts must be sufficiently covered + * + * This is avoided if analysis bails out for any reason. + */ + public void verify() { + doesStateReachAcceptState(startState); + } + + /** figure out if this state eventually reaches an accept state and + * modify the instance variable 'reduced' to indicate if we find + * at least one state that cannot reach an accept state. This implies + * that the overall DFA is not reduced. This algorithm should be + * linear in the number of DFA states. + * + * The algorithm also tracks which alternatives have no accept state, + * indicating a nondeterminism. + * + * Also computes whether the DFA is cyclic. + * + * TODO: I call getUniquelyPredicatedAlt too much; cache predicted alt + */ + protected boolean doesStateReachAcceptState(DFAState d) { + if ( d.isAcceptState() ) { + // accept states have no edges emanating from them so we can return + d.setAcceptStateReachable(REACHABLE_YES); + // this alt is uniquely predicted, remove from nondeterministic list + int predicts = d.getUniquelyPredictedAlt(); + unreachableAlts.remove(Utils.integer(predicts)); + return true; + } + + // avoid infinite loops + d.setAcceptStateReachable(REACHABLE_BUSY); + + boolean anEdgeReachesAcceptState = false; + // Visit every transition, track if at least one edge reaches stop state + // Cannot terminate when we know this state reaches stop state since + // all transitions must be traversed to set status of each DFA state. + for (int i=0; i0 ) { + buf.append(" && "); + } + buf.append("timed out (>"); + buf.append(DFA.MAX_TIME_PER_DFA_CREATION); + buf.append("ms)"); + } + buf.append("\n"); + return buf.toString(); + } + + /** What GrammarAST node (derived from the grammar) is this DFA + * associated with? It will point to the start of a block or + * the loop back of a (...)+ block etc... + */ + public GrammarAST getDecisionASTNode() { + return decisionNFAStartState.associatedASTNode; + } + + public boolean isGreedy() { + GrammarAST blockAST = nfa.grammar.getDecisionBlockAST(decisionNumber); + Object v = nfa.grammar.getBlockOption(blockAST,"greedy"); + if ( v!=null && v.equals("false") ) { + return false; + } + return true; + + } + + public DFAState newState() { + DFAState n = new DFAState(this); + n.stateNumber = stateCounter; + stateCounter++; + states.setSize(n.stateNumber+1); + states.set(n.stateNumber, n); // track state num to state + return n; + } + + public int getNumberOfStates() { + if ( getUserMaxLookahead()>0 ) { + // if using fixed lookahead then uniqueSets not set + return states.size(); + } + return numberOfStates; + } + + public int getNumberOfAlts() { + return nAlts; + } + + public boolean analysisTimedOut() { + return probe.analysisTimedOut(); + } + + protected void initAltRelatedInfo() { + unreachableAlts = new LinkedList(); + for (int i = 1; i <= nAlts; i++) { + unreachableAlts.add(Utils.integer(i)); + } + altToAcceptState = new DFAState[nAlts+1]; + } + + public String toString() { + FASerializer serializer = new FASerializer(nfa.grammar); + if ( startState==null ) { + return ""; + } + return serializer.serialize(startState, false); + } + + /** EOT (end of token) is a label that indicates when the DFA conversion + * algorithm would "fall off the end of a lexer rule". It normally + * means the default clause. So for ('a'..'z')+ you would see a DFA + * with a state that has a..z and EOT emanating from it. a..z would + * jump to a state predicting alt 1 and EOT would jump to a state + * predicting alt 2 (the exit loop branch). EOT implies anything other + * than a..z. If for some reason, the set is "all char" such as with + * the wildcard '.', then EOT cannot match anything. For example, + * + * BLOCK : '{' (.)* '}' + * + * consumes all char until EOF when greedy=true. When all edges are + * combined for the DFA state after matching '}', you will find that + * it is all char. The EOT transition has nothing to match and is + * unreachable. The findNewDFAStatesAndAddDFATransitions() method + * must know to ignore the EOT, so we simply remove it from the + * reachable labels. Later analysis will find that the exit branch + * is not predicted by anything. For greedy=false, we leave only + * the EOT label indicating that the DFA should stop immediately + * and predict the exit branch. The reachable labels are often a + * set of disjoint values like: [, 42, {0..41, 43..65534}] + * due to DFA conversion so must construct a pure set to see if + * it is same as Label.ALLCHAR. + * + * Only do this for Lexers. + * + * If EOT coexists with ALLCHAR: + * 1. If not greedy, modify the labels parameter to be EOT + * 2. If greedy, remove EOT from the labels set + protected boolean reachableLabelsEOTCoexistsWithAllChar(OrderedHashSet labels) + { + Label eot = new Label(Label.EOT); + if ( !labels.containsKey(eot) ) { + return false; + } + System.out.println("### contains EOT"); + boolean containsAllChar = false; + IntervalSet completeVocab = new IntervalSet(); + int n = labels.size(); + for (int i=0; iDFA->codegen pipeline seems very robust + * to me which I attribute to a uniform and consistent set of data + * structures. Regardless of what I want to "say"/implement, I do so + * within the confines of, for example, a DFA. The code generator + * can then just generate code--it doesn't have to do much thinking. + * Putting optimizations in the code gen code really starts to make + * it a spagetti factory (uh oh, now I'm hungry!). The pipeline is + * very testable; each stage has well defined input/output pairs. + * + * ### Optimization: PRUNE_EBNF_EXIT_BRANCHES + * + * There is no need to test EBNF block exit branches. Not only is it + * an unneeded computation, but counter-intuitively, you actually get + * better errors. You can report an error at the missing or extra + * token rather than as soon as you've figured out you will fail. + * + * Imagine optional block "( DOT CLASS )? SEMI". ANTLR generates: + * + * int alt=0; + * if ( input.LA(1)==DOT ) { + * alt=1; + * } + * else if ( input.LA(1)==SEMI ) { + * alt=2; + * } + * + * Clearly, since Parser.match() will ultimately find the error, we + * do not want to report an error nor do we want to bother testing + * lookahead against what follows the (...)? We want to generate + * simply "should I enter the subrule?": + * + * int alt=2; + * if ( input.LA(1)==DOT ) { + * alt=1; + * } + * + * NOTE 1. Greedy loops cannot be optimized in this way. For example, + * "(greedy=false:'x'|.)* '\n'". You specifically need the exit branch + * to tell you when to terminate the loop as the same input actually + * predicts one of the alts (i.e., staying in the loop). + * + * NOTE 2. I do not optimize cyclic DFAs at the moment as it doesn't + * seem to work. ;) I'll have to investigate later to see what work I + * can do on cyclic DFAs to make them have fewer edges. Might have + * something to do with the EOT token. + * + * ### PRUNE_SUPERFLUOUS_EOT_EDGES + * + * When a token is a subset of another such as the following rules, ANTLR + * quietly assumes the first token to resolve the ambiguity. + * + * EQ : '=' ; + * ASSIGNOP : '=' | '+=' ; + * + * It can yield states that have only a single edge on EOT to an accept + * state. This is a waste and messes up my code generation. ;) If + * Tokens rule DFA goes + * + * s0 -'='-> s3 -EOT-> s5 (accept) + * + * then s5 should be pruned and s3 should be made an accept. Do NOT do this + * for keyword versus ID as the state with EOT edge emanating from it will + * also have another edge. + * + * ### Optimization: COLLAPSE_ALL_INCIDENT_EDGES + * + * Done during DFA construction. See method addTransition() in + * NFAToDFAConverter. + * + * ### Optimization: MERGE_STOP_STATES + * + * Done during DFA construction. See addDFAState() in NFAToDFAConverter. + */ +public class DFAOptimizer { + public static boolean PRUNE_EBNF_EXIT_BRANCHES = true; + public static boolean PRUNE_TOKENS_RULE_SUPERFLUOUS_EOT_EDGES = true; + public static boolean COLLAPSE_ALL_PARALLEL_EDGES = true; + public static boolean MERGE_STOP_STATES = true; + + /** Used by DFA state machine generator to avoid infinite recursion + * resulting from cycles int the DFA. This is a set of int state #s. + * This is a side-effect of calling optimize; can't clear after use + * because code gen needs it. + */ + protected Set visited = new HashSet(); + + protected Grammar grammar; + + public DFAOptimizer(Grammar grammar) { + this.grammar = grammar; + } + + public void optimize() { + // optimize each DFA in this grammar + for (int decisionNumber=1; + decisionNumber<=grammar.getNumberOfDecisions(); + decisionNumber++) + { + DFA dfa = grammar.getLookaheadDFA(decisionNumber); + optimize(dfa); + } + } + + protected void optimize(DFA dfa) { + if ( dfa==null ) { + return; // nothing to do + } + /* + System.out.println("Optimize DFA "+dfa.decisionNFAStartState.decisionNumber+ + " num states="+dfa.getNumberOfStates()); + */ + //long start = System.currentTimeMillis(); + if ( PRUNE_EBNF_EXIT_BRANCHES && dfa.canInlineDecision() ) { + visited.clear(); + int decisionType = + dfa.getNFADecisionStartState().decisionStateType; + if ( dfa.isGreedy() && + (decisionType==NFAState.OPTIONAL_BLOCK_START || + decisionType==NFAState.LOOPBACK) ) + { + optimizeExitBranches(dfa.startState); + } + } + // If the Tokens rule has syntactically ambiguous rules, try to prune + if ( PRUNE_TOKENS_RULE_SUPERFLUOUS_EOT_EDGES && + dfa.isTokensRuleDecision() && + dfa.probe.stateToSyntacticallyAmbiguousTokensRuleAltsMap.size()>0 ) + { + visited.clear(); + optimizeEOTBranches(dfa.startState); + } + + /* ack...code gen needs this, cannot optimize + visited.clear(); + unlinkUnneededStateData(dfa.startState); + */ + //long stop = System.currentTimeMillis(); + //System.out.println("minimized in "+(int)(stop-start)+" ms"); + } + + protected void optimizeExitBranches(DFAState d) { + Integer sI = Utils.integer(d.stateNumber); + if ( visited.contains(sI) ) { + return; // already visited + } + visited.add(sI); + int nAlts = d.dfa.getNumberOfAlts(); + for (int i = 0; i < d.getNumberOfTransitions(); i++) { + Transition edge = (Transition) d.transition(i); + DFAState edgeTarget = ((DFAState)edge.target); + /* + System.out.println(d.stateNumber+"-"+ + edge.label.toString(d.dfa.nfa.grammar)+"->"+ + edgeTarget.stateNumber); + */ + // if target is an accept state and that alt is the exit alt + if ( edgeTarget.isAcceptState() && + edgeTarget.getUniquelyPredictedAlt()==nAlts) + { + /* + System.out.println("ignoring transition "+i+" to max alt "+ + d.dfa.getNumberOfAlts()); + */ + d.removeTransition(i); + i--; // back up one so that i++ of loop iteration stays within bounds + } + optimizeExitBranches(edgeTarget); + } + } + + protected void optimizeEOTBranches(DFAState d) { + Integer sI = Utils.integer(d.stateNumber); + if ( visited.contains(sI) ) { + return; // already visited + } + visited.add(sI); + for (int i = 0; i < d.getNumberOfTransitions(); i++) { + Transition edge = (Transition) d.transition(i); + DFAState edgeTarget = ((DFAState)edge.target); + /* + System.out.println(d.stateNumber+"-"+ + edge.label.toString(d.dfa.nfa.grammar)+"->"+ + edgeTarget.stateNumber); + */ + // if only one edge coming out, it is EOT, and target is accept prune + if ( PRUNE_TOKENS_RULE_SUPERFLUOUS_EOT_EDGES && + edgeTarget.isAcceptState() && + d.getNumberOfTransitions()==1 && + edge.label.isAtom() && + edge.label.getAtom()==Label.EOT ) + { + //System.out.println("state "+d+" can be pruned"); + // remove the superfluous EOT edge + d.removeTransition(i); + d.setAcceptState(true); // make it an accept state + // force it to uniquely predict the originally predicted state + d.cachedUniquelyPredicatedAlt = + edgeTarget.getUniquelyPredictedAlt(); + i--; // back up one so that i++ of loop iteration stays within bounds + } + optimizeEOTBranches(edgeTarget); + } + } + + /** Walk DFA states, unlinking the nfa configs and whatever else I + * can to reduce memory footprint. + protected void unlinkUnneededStateData(DFAState d) { + Integer sI = Utils.integer(d.stateNumber); + if ( visited.contains(sI) ) { + return; // already visited + } + visited.add(sI); + d.nfaConfigurations = null; + for (int i = 0; i < d.getNumberOfTransitions(); i++) { + Transition edge = (Transition) d.transition(i); + DFAState edgeTarget = ((DFAState)edge.target); + unlinkUnneededStateData(edgeTarget); + } + } + */ + +} diff --git a/antlr_3_1_source/analysis/DFAState.java b/antlr_3_1_source/analysis/DFAState.java new file mode 100644 index 0000000..4c2085b --- /dev/null +++ b/antlr_3_1_source/analysis/DFAState.java @@ -0,0 +1,776 @@ +/* + [The "BSD licence"] + Copyright (c) 2005-2006 Terence Parr + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +package org.antlr.analysis; + +import org.antlr.misc.IntSet; +import org.antlr.misc.MultiMap; +import org.antlr.misc.OrderedHashSet; +import org.antlr.misc.Utils; +import org.antlr.tool.Grammar; + +import java.util.*; + +/** A DFA state represents a set of possible NFA configurations. + * As Aho, Sethi, Ullman p. 117 says "The DFA uses its state + * to keep track of all possible states the NFA can be in after + * reading each input symbol. That is to say, after reading + * input a1a2..an, the DFA is in a state that represents the + * subset T of the states of the NFA that are reachable from the + * NFA's start state along some path labeled a1a2..an." + * In conventional NFA->DFA conversion, therefore, the subset T + * would be a bitset representing the set of states the + * NFA could be in. We need to track the alt predicted by each + * state as well, however. More importantly, we need to maintain + * a stack of states, tracking the closure operations as they + * jump from rule to rule, emulating rule invocations (method calls). + * Recall that NFAs do not normally have a stack like a pushdown-machine + * so I have to add one to simulate the proper lookahead sequences for + * the underlying LL grammar from which the NFA was derived. + * + * I use a list of NFAConfiguration objects. An NFAConfiguration + * is both a state (ala normal conversion) and an NFAContext describing + * the chain of rules (if any) followed to arrive at that state. There + * is also the semantic context, which is the "set" of predicates found + * on the path to this configuration. + * + * A DFA state may have multiple references to a particular state, + * but with different NFAContexts (with same or different alts) + * meaning that state was reached via a different set of rule invocations. + */ +public class DFAState extends State { + public static final int INITIAL_NUM_TRANSITIONS = 4; + public static final int PREDICTED_ALT_UNSET = NFA.INVALID_ALT_NUMBER-1; + + /** We are part of what DFA? Use this ref to get access to the + * context trees for an alt. + */ + public DFA dfa; + + /** Track the transitions emanating from this DFA state. The List + * elements are Transition objects. + */ + protected List transitions = + new ArrayList(INITIAL_NUM_TRANSITIONS); + + /** When doing an acyclic DFA, this is the number of lookahead symbols + * consumed to reach this state. This value may be nonzero for most + * dfa states, but it is only a valid value if the user has specified + * a max fixed lookahead. + */ + protected int k; + + /** The NFA->DFA algorithm may terminate leaving some states + * without a path to an accept state, implying that upon certain + * input, the decision is not deterministic--no decision about + * predicting a unique alternative can be made. Recall that an + * accept state is one in which a unique alternative is predicted. + */ + protected int acceptStateReachable = DFA.REACHABLE_UNKNOWN; + + /** Rather than recheck every NFA configuration in a DFA state (after + * resolving) in findNewDFAStatesAndAddDFATransitions just check + * this boolean. Saves a linear walk perhaps DFA state creation. + * Every little bit helps. + */ + protected boolean resolvedWithPredicates = false; + + /** If a closure operation finds that we tried to invoke the same + * rule too many times (stack would grow beyond a threshold), it + * marks the state has aborted and notifies the DecisionProbe. + */ + public boolean abortedDueToRecursionOverflow = false; + + /** If we detect recursion on more than one alt, decision is non-LL(*), + * but try to isolate it to only those states whose closure operations + * detect recursion. There may be other alts that are cool: + * + * a : recur '.' + * | recur ';' + * | X Y // LL(2) decision; don't abort and use k=1 plus backtracking + * | X Z + * ; + * + * 12/13/2007: Actually this has caused problems. If k=*, must terminate + * and throw out entire DFA; retry with k=1. Since recursive, do not + * attempt more closure ops as it may take forever. Exception thrown + * now and we simply report the problem. If synpreds exist, I'll retry + * with k=1. + */ + protected boolean abortedDueToMultipleRecursiveAlts = false; + + /** Build up the hash code for this state as NFA configurations + * are added as it's monotonically increasing list of configurations. + */ + protected int cachedHashCode; + + protected int cachedUniquelyPredicatedAlt = PREDICTED_ALT_UNSET; + + public int minAltInConfigurations=Integer.MAX_VALUE; + + public boolean atLeastOneConfigurationHasAPredicate = false; + + /** The set of NFA configurations (state,alt,context) for this DFA state */ + public OrderedHashSet nfaConfigurations = + new OrderedHashSet(); + + public List configurationsWithLabeledEdges = + new ArrayList(); + + /** Used to prevent the closure operation from looping to itself and + * hence looping forever. Sensitive to the NFA state, the alt, and + * the stack context. This just the nfa config set because we want to + * prevent closures only on states contributed by closure not reach + * operations. + * + * Two configurations identical including semantic context are + * considered the same closure computation. @see NFAToDFAConverter.closureBusy(). + */ + protected Set closureBusy = new HashSet(); + + /** As this state is constructed (i.e., as NFA states are added), we + * can easily check for non-epsilon transitions because the only + * transition that could be a valid label is transition(0). When we + * process this node eventually, we'll have to walk all states looking + * for all possible transitions. That is of the order: size(label space) + * times size(nfa states), which can be pretty damn big. It's better + * to simply track possible labels. + */ + protected OrderedHashSet