mirror of
https://github.com/Combodo/iTop.git
synced 2026-04-23 18:48:51 +02:00
⬆️ Upgrade lib : nikic/php-parser
We were on v3 that is no longer maintained and compatibility is annonced for PHP 7.2. v4 is active and supports PHP up to 8.0 No problem to update as this is only used in the config editor (\Combodo\iTop\Config\Validator\iTopConfigAstValidator)
This commit is contained in:
@@ -1,80 +0,0 @@
|
||||
Introduction
|
||||
============
|
||||
|
||||
This project is a PHP 5.2 to PHP 7.1 parser **written in PHP itself**.
|
||||
|
||||
What is this for?
|
||||
-----------------
|
||||
|
||||
A parser is useful for [static analysis][0], manipulation of code and basically any other
|
||||
application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
|
||||
(AST) of the code and thus allows dealing with it in an abstract and robust way.
|
||||
|
||||
There are other ways of processing source code. One that PHP supports natively is using the
|
||||
token stream generated by [`token_get_all`][2]. The token stream is much more low level than
|
||||
the AST and thus has different applications: It allows to also analyze the exact formatting of
|
||||
a file. On the other hand the token stream is much harder to deal with for more complex analysis.
|
||||
For example an AST abstracts away the fact that in PHP variables can be written as `$foo`, but also
|
||||
as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing
|
||||
all the different syntaxes from a stream of tokens.
|
||||
|
||||
Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be
|
||||
a language especially suited for fast parsing, but processing the AST is much easier in PHP than it
|
||||
would be in other, faster languages like C. Furthermore the people most probably wanting to do
|
||||
programmatic PHP code analysis are incidentally PHP developers, not C developers.
|
||||
|
||||
What can it parse?
|
||||
------------------
|
||||
|
||||
The parser supports parsing PHP 5.2-5.6 and PHP 7.
|
||||
|
||||
As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
|
||||
version it runs on), additionally a wrapper for emulating tokens from newer versions is provided.
|
||||
This allows to parse PHP 7.1 source code running on PHP 5.5, for example. This emulation is somewhat
|
||||
hacky and not perfect, but it should work well on any sane code.
|
||||
|
||||
What output does it produce?
|
||||
----------------------------
|
||||
|
||||
The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks like
|
||||
can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
|
||||
roughly looking like this:
|
||||
|
||||
```
|
||||
array(
|
||||
0: Stmt_Echo(
|
||||
exprs: array(
|
||||
0: Scalar_String(
|
||||
value: Hi
|
||||
)
|
||||
1: Scalar_String(
|
||||
value: World
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
This matches the structure of the code: An echo statement, which takes two strings as expressions,
|
||||
with the values `Hi` and `World!`.
|
||||
|
||||
You can also see that the AST does not contain any whitespace information (but most comments are saved).
|
||||
So using it for formatting analysis is not possible.
|
||||
|
||||
What else can it do?
|
||||
--------------------
|
||||
|
||||
Apart from the parser itself this package also bundles support for some other, related features:
|
||||
|
||||
* Support for pretty printing, which is the act of converting an AST into PHP code. Please note
|
||||
that "pretty printing" does not imply that the output is especially pretty. It's just how it's
|
||||
called ;)
|
||||
* Support for serializing and unserializing the node tree to XML
|
||||
* Support for dumping the node tree in a human readable form (see the section above for an
|
||||
example of how the output looks like)
|
||||
* Infrastructure for traversing and changing the AST (node traverser and node visitors)
|
||||
* A node visitor for resolving namespaced names
|
||||
|
||||
[0]: http://en.wikipedia.org/wiki/Static_program_analysis
|
||||
[1]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
|
||||
[2]: http://php.net/token_get_all
|
||||
@@ -1,438 +0,0 @@
|
||||
Usage of basic components
|
||||
=========================
|
||||
|
||||
This document explains how to use the parser, the pretty printer and the node traverser.
|
||||
|
||||
Bootstrapping
|
||||
-------------
|
||||
|
||||
To bootstrap the library, include the autoloader generated by composer:
|
||||
|
||||
```php
|
||||
require 'path/to/vendor/autoload.php';
|
||||
```
|
||||
|
||||
Additionally you may want to set the `xdebug.max_nesting_level` ini option to a higher value:
|
||||
|
||||
```php
|
||||
ini_set('xdebug.max_nesting_level', 3000);
|
||||
```
|
||||
|
||||
This ensures that there will be no errors when traversing highly nested node trees. However, it is
|
||||
preferable to disable XDebug completely, as it can easily make this library more than five times
|
||||
slower.
|
||||
|
||||
Parsing
|
||||
-------
|
||||
|
||||
In order to parse code, you first have to create a parser instance:
|
||||
|
||||
```php
|
||||
use PhpParser\ParserFactory;
|
||||
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
|
||||
```
|
||||
|
||||
The factory accepts a kind argument, that determines how different PHP versions are treated:
|
||||
|
||||
Kind | Behavior
|
||||
-----|---------
|
||||
`ParserFactory::PREFER_PHP7` | Try to parse code as PHP 7. If this fails, try to parse it as PHP 5.
|
||||
`ParserFactory::PREFER_PHP5` | Try to parse code as PHP 5. If this fails, try to parse it as PHP 7.
|
||||
`ParserFactory::ONLY_PHP7` | Parse code as PHP 7.
|
||||
`ParserFactory::ONLY_PHP5` | Parse code as PHP 5.
|
||||
|
||||
Unless you have strong reason to use something else, `PREFER_PHP7` is a reasonable default.
|
||||
|
||||
The `create()` method optionally accepts a `Lexer` instance as the second argument. Some use cases
|
||||
that require customized lexers are discussed in the [lexer documentation](component/Lexer.markdown).
|
||||
|
||||
Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
|
||||
create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
|
||||
|
||||
```php
|
||||
use PhpParser\Error;
|
||||
use PhpParser\ParserFactory;
|
||||
|
||||
$code = '<?php // some code';
|
||||
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
// $stmts is an array of statement nodes
|
||||
} catch (Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
A parser instance can be reused to parse multiple files.
|
||||
|
||||
Node tree
|
||||
---------
|
||||
|
||||
If you use the above code with `$code = "<?php echo 'Hi ', hi\\getTarget();"` the parser will
|
||||
generate a node tree looking like this:
|
||||
|
||||
```
|
||||
array(
|
||||
0: Stmt_Echo(
|
||||
exprs: array(
|
||||
0: Scalar_String(
|
||||
value: Hi
|
||||
)
|
||||
1: Expr_FuncCall(
|
||||
name: Name(
|
||||
parts: array(
|
||||
0: hi
|
||||
1: getTarget
|
||||
)
|
||||
)
|
||||
args: array(
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
Thus `$stmts` will contain an array with only one node, with this node being an instance of
|
||||
`PhpParser\Node\Stmt\Echo_`.
|
||||
|
||||
As PHP is a large language there are approximately 140 different nodes. In order to make work
|
||||
with them easier they are grouped into three categories:
|
||||
|
||||
* `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return
|
||||
a value and can not occur in an expression. For example a class definition is a statement.
|
||||
It doesn't return a value and you can't write something like `func(class A {});`.
|
||||
* `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value
|
||||
and thus can occur in other expressions. Examples of expressions are `$var`
|
||||
(`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`).
|
||||
* `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'`
|
||||
(`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants
|
||||
like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend
|
||||
`PhpParser\Node\Expr`, as scalars are expressions, too.
|
||||
* There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`)
|
||||
and call arguments (`PhpParser\Node\Arg`).
|
||||
|
||||
Some node class names have a trailing `_`. This is used whenever the class name would otherwise clash
|
||||
with a PHP keyword.
|
||||
|
||||
Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
|
||||
`$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
|
||||
in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
|
||||
call, you would write `$stmts[0]->exprs[1]->name`.
|
||||
|
||||
All nodes also define a `getType()` method that returns the node type. The type is the class name
|
||||
without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing
|
||||
`_` for reserved-keyword class names.
|
||||
|
||||
It is possible to associate custom metadata with a node using the `setAttribute()` method. This data
|
||||
can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`.
|
||||
|
||||
By default the lexer adds the `startLine`, `endLine` and `comments` attributes. `comments` is an array
|
||||
of `PhpParser\Comment[\Doc]` instances.
|
||||
|
||||
The start line can also be accessed using `getLine()`/`setLine()` (instead of `getAttribute('startLine')`).
|
||||
The last doc comment from the `comments` attribute can be obtained using `getDocComment()`.
|
||||
|
||||
Pretty printer
|
||||
--------------
|
||||
|
||||
The pretty printer component compiles the AST back to PHP code. As the parser does not retain formatting
|
||||
information the formatting is done using a specified scheme. Currently there is only one scheme available,
|
||||
namely `PhpParser\PrettyPrinter\Standard`.
|
||||
|
||||
```php
|
||||
use PhpParser\Error;
|
||||
use PhpParser\ParserFactory;
|
||||
use PhpParser\PrettyPrinter;
|
||||
|
||||
$code = "<?php echo 'Hi ', hi\\getTarget();";
|
||||
|
||||
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
|
||||
$prettyPrinter = new PrettyPrinter\Standard;
|
||||
|
||||
try {
|
||||
// parse
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
// change
|
||||
$stmts[0] // the echo statement
|
||||
->exprs // sub expressions
|
||||
[0] // the first of them (the string node)
|
||||
->value // it's value, i.e. 'Hi '
|
||||
= 'Hello '; // change to 'Hello '
|
||||
|
||||
// pretty print
|
||||
$code = $prettyPrinter->prettyPrint($stmts);
|
||||
|
||||
echo $code;
|
||||
} catch (Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
The above code will output:
|
||||
|
||||
<?php echo 'Hello ', hi\getTarget();
|
||||
|
||||
As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
|
||||
again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
|
||||
|
||||
The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
|
||||
single expression using `prettyPrintExpr()`.
|
||||
|
||||
The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
|
||||
and handle inline HTML as the first/last statement more gracefully.
|
||||
|
||||
Node traversation
|
||||
-----------------
|
||||
|
||||
The above pretty printing example used the fact that the source code was known and thus it was easy to
|
||||
write code that accesses a certain part of a node tree and changes it. Normally this is not the case.
|
||||
Usually you want to change / analyze code in a generic way, where you don't know how the node tree is
|
||||
going to look like.
|
||||
|
||||
For this purpose the parser provides a component for traversing and visiting the node tree. The basic
|
||||
structure of a program using this `PhpParser\NodeTraverser` looks like this:
|
||||
|
||||
```php
|
||||
use PhpParser\NodeTraverser;
|
||||
use PhpParser\ParserFactory;
|
||||
use PhpParser\PrettyPrinter;
|
||||
|
||||
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
|
||||
$traverser = new NodeTraverser;
|
||||
$prettyPrinter = new PrettyPrinter\Standard;
|
||||
|
||||
// add your visitor
|
||||
$traverser->addVisitor(new MyNodeVisitor);
|
||||
|
||||
try {
|
||||
$code = file_get_contents($fileName);
|
||||
|
||||
// parse
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
// traverse
|
||||
$stmts = $traverser->traverse($stmts);
|
||||
|
||||
// pretty print
|
||||
$code = $prettyPrinter->prettyPrintFile($stmts);
|
||||
|
||||
echo $code;
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
The corresponding node visitor might look like this:
|
||||
|
||||
```php
|
||||
use PhpParser\Node;
|
||||
use PhpParser\NodeVisitorAbstract;
|
||||
|
||||
class MyNodeVisitor extends NodeVisitorAbstract
|
||||
{
|
||||
public function leaveNode(Node $node) {
|
||||
if ($node instanceof Node\Scalar\String_) {
|
||||
$node->value = 'foo';
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The above node visitor would change all string literals in the program to `'foo'`.
|
||||
|
||||
All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
|
||||
methods:
|
||||
|
||||
```php
|
||||
public function beforeTraverse(array $nodes);
|
||||
public function enterNode(\PhpParser\Node $node);
|
||||
public function leaveNode(\PhpParser\Node $node);
|
||||
public function afterTraverse(array $nodes);
|
||||
```
|
||||
|
||||
The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the
|
||||
traverser was called with. This method can be used for resetting values before traversation or
|
||||
preparing the tree for traversal.
|
||||
|
||||
The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that
|
||||
it is called once after the traversal.
|
||||
|
||||
The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered,
|
||||
i.e. before its subnodes are traversed, the latter when it is left.
|
||||
|
||||
All four methods can either return the changed node or not return at all (i.e. `null`) in which
|
||||
case the current node is not changed.
|
||||
|
||||
The `enterNode()` method can additionally return the value `NodeTraverser::DONT_TRAVERSE_CHILDREN`,
|
||||
which instructs the traverser to skip all children of the current node.
|
||||
|
||||
The `leaveNode()` method can additionally return the value `NodeTraverser::REMOVE_NODE`, in which
|
||||
case the current node will be removed from the parent array. Furthermore it is possible to return
|
||||
an array of nodes, which will be merged into the parent array at the offset of the current node.
|
||||
I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will
|
||||
be `array(A, X, Y, Z, C)`.
|
||||
|
||||
Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
|
||||
class, which will define empty default implementations for all the above methods.
|
||||
|
||||
The NameResolver node visitor
|
||||
-----------------------------
|
||||
|
||||
One visitor is already bundled with the package: `PhpParser\NodeVisitor\NameResolver`. This visitor
|
||||
helps you work with namespaced code by trying to resolve most names to fully qualified ones.
|
||||
|
||||
For example, consider the following code:
|
||||
|
||||
use A as B;
|
||||
new B\C();
|
||||
|
||||
In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself.
|
||||
The `NameResolver` takes care of that and resolves names as far as possible.
|
||||
|
||||
After running it most names will be fully qualified. The only names that will stay unqualified are
|
||||
unqualified function and constant names. These are resolved at runtime and thus the visitor can't
|
||||
know which function they are referring to. In most cases this is a non-issue as the global functions
|
||||
are meant.
|
||||
|
||||
Also the `NameResolver` adds a `namespacedName` subnode to class, function and constant declarations
|
||||
that contains the namespaced name instead of only the shortname that is available via `name`.
|
||||
|
||||
Example: Converting namespaced code to pseudo namespaces
|
||||
--------------------------------------------------------
|
||||
|
||||
A small example to understand the concept: We want to convert namespaced code to pseudo namespaces
|
||||
so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
|
||||
are fairly complicated if you take PHP's dynamic features into account, so our conversion will
|
||||
assume that no dynamic features are used.
|
||||
|
||||
We start off with the following base code:
|
||||
|
||||
```php
|
||||
use PhpParser\ParserFactory;
|
||||
use PhpParser\PrettyPrinter;
|
||||
use PhpParser\NodeTraverser;
|
||||
use PhpParser\NodeVisitor\NameResolver;
|
||||
|
||||
$inDir = '/some/path';
|
||||
$outDir = '/some/other/path';
|
||||
|
||||
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
|
||||
$traverser = new NodeTraverser;
|
||||
$prettyPrinter = new PrettyPrinter\Standard;
|
||||
|
||||
$traverser->addVisitor(new NameResolver); // we will need resolved names
|
||||
$traverser->addVisitor(new NamespaceConverter); // our own node visitor
|
||||
|
||||
// iterate over all .php files in the directory
|
||||
$files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir));
|
||||
$files = new \RegexIterator($files, '/\.php$/');
|
||||
|
||||
foreach ($files as $file) {
|
||||
try {
|
||||
// read the file that should be converted
|
||||
$code = file_get_contents($file);
|
||||
|
||||
// parse
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
// traverse
|
||||
$stmts = $traverser->traverse($stmts);
|
||||
|
||||
// pretty print
|
||||
$code = $prettyPrinter->prettyPrintFile($stmts);
|
||||
|
||||
// write the converted file to the target directory
|
||||
file_put_contents(
|
||||
substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
|
||||
$code
|
||||
);
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
|
||||
is convert `A\\B` style names to `A_B` style ones.
|
||||
|
||||
```php
|
||||
use PhpParser\Node;
|
||||
|
||||
class NamespaceConverter extends \PhpParser\NodeVisitorAbstract
|
||||
{
|
||||
public function leaveNode(Node $node) {
|
||||
if ($node instanceof Node\Name) {
|
||||
return new Node\Name($node->toString('_'));
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The above code profits from the fact that the `NameResolver` already resolved all names as far as
|
||||
possible, so we don't need to do that. We only need to create a string with the name parts separated
|
||||
by underscores instead of backslashes. This is what `$node->toString('_')` does. (If you want to
|
||||
create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
|
||||
a new name from the string and return it. Returning a new node replaces the old node.
|
||||
|
||||
Another thing we need to do is change the class/function/const declarations. Currently they contain
|
||||
only the shortname (i.e. the last part of the name), but they need to contain the complete name including
|
||||
the namespace prefix:
|
||||
|
||||
```php
|
||||
use PhpParser\Node;
|
||||
use PhpParser\Node\Stmt;
|
||||
|
||||
class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
|
||||
{
|
||||
public function leaveNode(Node $node) {
|
||||
if ($node instanceof Node\Name) {
|
||||
return new Node\Name($node->toString('_'));
|
||||
} elseif ($node instanceof Stmt\Class_
|
||||
|| $node instanceof Stmt\Interface_
|
||||
|| $node instanceof Stmt\Function_) {
|
||||
$node->name = $node->namespacedName->toString('_');
|
||||
} elseif ($node instanceof Stmt\Const_) {
|
||||
foreach ($node->consts as $const) {
|
||||
$const->name = $const->namespacedName->toString('_');
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
There is not much more to it than converting the namespaced name to string with `_` as separator.
|
||||
|
||||
The last thing we need to do is remove the `namespace` and `use` statements:
|
||||
|
||||
```php
|
||||
use PhpParser\Node;
|
||||
use PhpParser\Node\Stmt;
|
||||
|
||||
class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
|
||||
{
|
||||
public function leaveNode(Node $node) {
|
||||
if ($node instanceof Node\Name) {
|
||||
return new Node\Name($node->toString('_'));
|
||||
} elseif ($node instanceof Stmt\Class_
|
||||
|| $node instanceof Stmt\Interface_
|
||||
|| $node instanceof Stmt\Function_) {
|
||||
$node->name = $node->namespacedName->toString('_');
|
||||
} elseif ($node instanceof Stmt\Const_) {
|
||||
foreach ($node->consts as $const) {
|
||||
$const->name = $const->namespacedName->toString('_');
|
||||
}
|
||||
} elseif ($node instanceof Stmt\Namespace_) {
|
||||
// returning an array merges is into the parent array
|
||||
return $node->stmts;
|
||||
} elseif ($node instanceof Stmt\Use_) {
|
||||
// returning false removed the node altogether
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
That's all.
|
||||
@@ -1,330 +0,0 @@
|
||||
Other node tree representations
|
||||
===============================
|
||||
|
||||
It is possible to convert the AST into several textual representations, which serve different uses.
|
||||
|
||||
Simple serialization
|
||||
--------------------
|
||||
|
||||
It is possible to serialize the node tree using `serialize()` and also unserialize it using
|
||||
`unserialize()`. The output is not human readable and not easily processable from anything
|
||||
but PHP, but it is compact and generates quickly. The main application thus is in caching.
|
||||
|
||||
Human readable dumping
|
||||
----------------------
|
||||
|
||||
Furthermore it is possible to dump nodes into a human readable format using the `dump` method of
|
||||
`PhpParser\NodeDumper`. This can be used for debugging.
|
||||
|
||||
```php
|
||||
$code = <<<'CODE'
|
||||
<?php
|
||||
|
||||
function printLine($msg) {
|
||||
echo $msg, "\n";
|
||||
}
|
||||
|
||||
printLine('Hello World!!!');
|
||||
CODE;
|
||||
|
||||
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7);
|
||||
$nodeDumper = new PhpParser\NodeDumper;
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
echo $nodeDumper->dump($stmts), "\n";
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
The above script will have an output looking roughly like this:
|
||||
|
||||
```
|
||||
array(
|
||||
0: Stmt_Function(
|
||||
byRef: false
|
||||
params: array(
|
||||
0: Param(
|
||||
name: msg
|
||||
default: null
|
||||
type: null
|
||||
byRef: false
|
||||
)
|
||||
)
|
||||
stmts: array(
|
||||
0: Stmt_Echo(
|
||||
exprs: array(
|
||||
0: Expr_Variable(
|
||||
name: msg
|
||||
)
|
||||
1: Scalar_String(
|
||||
value:
|
||||
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
name: printLine
|
||||
)
|
||||
1: Expr_FuncCall(
|
||||
name: Name(
|
||||
parts: array(
|
||||
0: printLine
|
||||
)
|
||||
)
|
||||
args: array(
|
||||
0: Arg(
|
||||
value: Scalar_String(
|
||||
value: Hello World!!!
|
||||
)
|
||||
byRef: false
|
||||
)
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
JSON encoding
|
||||
-------------
|
||||
|
||||
Nodes (and comments) implement the `JsonSerializable` interface. As such, it is possible to JSON
|
||||
encode the AST directly using `json_encode()`:
|
||||
|
||||
```php
|
||||
$code = <<<'CODE'
|
||||
<?php
|
||||
|
||||
function printLine($msg) {
|
||||
echo $msg, "\n";
|
||||
}
|
||||
|
||||
printLine('Hello World!!!');
|
||||
CODE;
|
||||
|
||||
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7);
|
||||
$nodeDumper = new PhpParser\NodeDumper;
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
echo json_encode($stmts, JSON_PRETTY_PRINT), "\n";
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
This will result in the following output (which includes attributes):
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"nodeType": "Stmt_Function",
|
||||
"byRef": false,
|
||||
"name": "printLine",
|
||||
"params": [
|
||||
{
|
||||
"nodeType": "Param",
|
||||
"type": null,
|
||||
"byRef": false,
|
||||
"variadic": false,
|
||||
"name": "msg",
|
||||
"default": null,
|
||||
"attributes": {
|
||||
"startLine": 3,
|
||||
"endLine": 3
|
||||
}
|
||||
}
|
||||
],
|
||||
"returnType": null,
|
||||
"stmts": [
|
||||
{
|
||||
"nodeType": "Stmt_Echo",
|
||||
"exprs": [
|
||||
{
|
||||
"nodeType": "Expr_Variable",
|
||||
"name": "msg",
|
||||
"attributes": {
|
||||
"startLine": 4,
|
||||
"endLine": 4
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeType": "Scalar_String",
|
||||
"value": "\n",
|
||||
"attributes": {
|
||||
"startLine": 4,
|
||||
"endLine": 4,
|
||||
"kind": 2
|
||||
}
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"startLine": 4,
|
||||
"endLine": 4
|
||||
}
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"startLine": 3,
|
||||
"endLine": 5
|
||||
}
|
||||
},
|
||||
{
|
||||
"nodeType": "Expr_FuncCall",
|
||||
"name": {
|
||||
"nodeType": "Name",
|
||||
"parts": [
|
||||
"printLine"
|
||||
],
|
||||
"attributes": {
|
||||
"startLine": 7,
|
||||
"endLine": 7
|
||||
}
|
||||
},
|
||||
"args": [
|
||||
{
|
||||
"nodeType": "Arg",
|
||||
"value": {
|
||||
"nodeType": "Scalar_String",
|
||||
"value": "Hello World!!!",
|
||||
"attributes": {
|
||||
"startLine": 7,
|
||||
"endLine": 7,
|
||||
"kind": 1
|
||||
}
|
||||
},
|
||||
"byRef": false,
|
||||
"unpack": false,
|
||||
"attributes": {
|
||||
"startLine": 7,
|
||||
"endLine": 7
|
||||
}
|
||||
}
|
||||
],
|
||||
"attributes": {
|
||||
"startLine": 7,
|
||||
"endLine": 7
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
There is currently no mechanism to convert JSON back into a node tree. Furthermore, not all ASTs
|
||||
can be JSON encoded. In particular, JSON only supports UTF-8 strings.
|
||||
|
||||
Serialization to XML
|
||||
--------------------
|
||||
|
||||
It is also possible to serialize the node tree to XML using `PhpParser\Serializer\XML->serialize()`
|
||||
and to unserialize it using `PhpParser\Unserializer\XML->unserialize()`. This is useful for
|
||||
interfacing with other languages and applications or for doing transformation using XSLT.
|
||||
|
||||
```php
|
||||
<?php
|
||||
$code = <<<'CODE'
|
||||
<?php
|
||||
|
||||
function printLine($msg) {
|
||||
echo $msg, "\n";
|
||||
}
|
||||
|
||||
printLine('Hello World!!!');
|
||||
CODE;
|
||||
|
||||
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7);
|
||||
$serializer = new PhpParser\Serializer\XML;
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
|
||||
echo $serializer->serialize($stmts);
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
Produces:
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<AST xmlns:node="http://nikic.github.com/PHPParser/XML/node" xmlns:subNode="http://nikic.github.com/PHPParser/XML/subNode" xmlns:scalar="http://nikic.github.com/PHPParser/XML/scalar">
|
||||
<scalar:array>
|
||||
<node:Stmt_Function line="2">
|
||||
<subNode:byRef>
|
||||
<scalar:false/>
|
||||
</subNode:byRef>
|
||||
<subNode:params>
|
||||
<scalar:array>
|
||||
<node:Param line="2">
|
||||
<subNode:name>
|
||||
<scalar:string>msg</scalar:string>
|
||||
</subNode:name>
|
||||
<subNode:default>
|
||||
<scalar:null/>
|
||||
</subNode:default>
|
||||
<subNode:type>
|
||||
<scalar:null/>
|
||||
</subNode:type>
|
||||
<subNode:byRef>
|
||||
<scalar:false/>
|
||||
</subNode:byRef>
|
||||
</node:Param>
|
||||
</scalar:array>
|
||||
</subNode:params>
|
||||
<subNode:stmts>
|
||||
<scalar:array>
|
||||
<node:Stmt_Echo line="3">
|
||||
<subNode:exprs>
|
||||
<scalar:array>
|
||||
<node:Expr_Variable line="3">
|
||||
<subNode:name>
|
||||
<scalar:string>msg</scalar:string>
|
||||
</subNode:name>
|
||||
</node:Expr_Variable>
|
||||
<node:Scalar_String line="3">
|
||||
<subNode:value>
|
||||
<scalar:string>
|
||||
</scalar:string>
|
||||
</subNode:value>
|
||||
</node:Scalar_String>
|
||||
</scalar:array>
|
||||
</subNode:exprs>
|
||||
</node:Stmt_Echo>
|
||||
</scalar:array>
|
||||
</subNode:stmts>
|
||||
<subNode:name>
|
||||
<scalar:string>printLine</scalar:string>
|
||||
</subNode:name>
|
||||
</node:Stmt_Function>
|
||||
<node:Expr_FuncCall line="6">
|
||||
<subNode:name>
|
||||
<node:Name line="6">
|
||||
<subNode:parts>
|
||||
<scalar:array>
|
||||
<scalar:string>printLine</scalar:string>
|
||||
</scalar:array>
|
||||
</subNode:parts>
|
||||
</node:Name>
|
||||
</subNode:name>
|
||||
<subNode:args>
|
||||
<scalar:array>
|
||||
<node:Arg line="6">
|
||||
<subNode:value>
|
||||
<node:Scalar_String line="6">
|
||||
<subNode:value>
|
||||
<scalar:string>Hello World!!!</scalar:string>
|
||||
</subNode:value>
|
||||
</node:Scalar_String>
|
||||
</subNode:value>
|
||||
<subNode:byRef>
|
||||
<scalar:false/>
|
||||
</subNode:byRef>
|
||||
</node:Arg>
|
||||
</scalar:array>
|
||||
</subNode:args>
|
||||
</node:Expr_FuncCall>
|
||||
</scalar:array>
|
||||
</AST>
|
||||
```
|
||||
@@ -1,84 +0,0 @@
|
||||
Code generation
|
||||
===============
|
||||
|
||||
It is also possible to generate code using the parser, by first creating an Abstract Syntax Tree and then using the
|
||||
pretty printer to convert it to PHP code. To simplify code generation, the project comes with builders which allow
|
||||
creating node trees using a fluid interface, instead of instantiating all nodes manually. Builders are available for
|
||||
the following syntactic elements:
|
||||
|
||||
* namespaces and use statements
|
||||
* classes, interfaces and traits
|
||||
* methods, functions and parameters
|
||||
* properties
|
||||
|
||||
Here is an example:
|
||||
|
||||
```php
|
||||
use PhpParser\BuilderFactory;
|
||||
use PhpParser\PrettyPrinter;
|
||||
use PhpParser\Node;
|
||||
|
||||
$factory = new BuilderFactory;
|
||||
$node = $factory->namespace('Name\Space')
|
||||
->addStmt($factory->use('Some\Other\Thingy')->as('SomeOtherClass'))
|
||||
->addStmt($factory->class('SomeClass')
|
||||
->extend('SomeOtherClass')
|
||||
->implement('A\Few', '\Interfaces')
|
||||
->makeAbstract() // ->makeFinal()
|
||||
|
||||
->addStmt($factory->method('someMethod')
|
||||
->makePublic()
|
||||
->makeAbstract() // ->makeFinal()
|
||||
->setReturnType('bool')
|
||||
->addParam($factory->param('someParam')->setTypeHint('SomeClass'))
|
||||
->setDocComment('/**
|
||||
* This method does something.
|
||||
*
|
||||
* @param SomeClass And takes a parameter
|
||||
*/')
|
||||
)
|
||||
|
||||
->addStmt($factory->method('anotherMethod')
|
||||
->makeProtected() // ->makePublic() [default], ->makePrivate()
|
||||
->addParam($factory->param('someParam')->setDefault('test'))
|
||||
// it is possible to add manually created nodes
|
||||
->addStmt(new Node\Expr\Print_(new Node\Expr\Variable('someParam')))
|
||||
)
|
||||
|
||||
// properties will be correctly reordered above the methods
|
||||
->addStmt($factory->property('someProperty')->makeProtected())
|
||||
->addStmt($factory->property('anotherProperty')->makePrivate()->setDefault(array(1, 2, 3)))
|
||||
)
|
||||
|
||||
->getNode()
|
||||
;
|
||||
|
||||
$stmts = array($node);
|
||||
$prettyPrinter = new PrettyPrinter\Standard();
|
||||
echo $prettyPrinter->prettyPrintFile($stmts);
|
||||
```
|
||||
|
||||
This will produce the following output with the standard pretty printer:
|
||||
|
||||
```php
|
||||
<?php
|
||||
|
||||
namespace Name\Space;
|
||||
|
||||
use Some\Other\Thingy as SomeClass;
|
||||
abstract class SomeClass extends SomeOtherClass implements A\Few, \Interfaces
|
||||
{
|
||||
protected $someProperty;
|
||||
private $anotherProperty = array(1, 2, 3);
|
||||
/**
|
||||
* This method does something.
|
||||
*
|
||||
* @param SomeClass And takes a parameter
|
||||
*/
|
||||
public abstract function someMethod(SomeClass $someParam) : bool;
|
||||
protected function anotherMethod($someParam = 'test')
|
||||
{
|
||||
print $someParam;
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -1,75 +0,0 @@
|
||||
Error handling
|
||||
==============
|
||||
|
||||
Errors during parsing or analysis are represented using the `PhpParser\Error` exception class. In addition to an error
|
||||
message, an error can also store additional information about the location the error occurred at.
|
||||
|
||||
How much location information is available depends on the origin of the error and how many lexer attributes have been
|
||||
enabled. At a minimum the start line of the error is usually available.
|
||||
|
||||
Column information
|
||||
------------------
|
||||
|
||||
In order to receive information about not only the line, but also the column span an error occurred at, the file
|
||||
position attributes in the lexer need to be enabled:
|
||||
|
||||
```php
|
||||
$lexer = new PhpParser\Lexer(array(
|
||||
'usedAttributes' => array('comments', 'startLine', 'endLine', 'startFilePos', 'endFilePos'),
|
||||
));
|
||||
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, $lexer);
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
// ...
|
||||
} catch (PhpParser\Error $e) {
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
Before using column information its availability needs to be checked with `$e->hasColumnInfo()`, as the precise
|
||||
location of an error cannot always be determined. The methods for retrieving column information also have to be passed
|
||||
the source code of the parsed file. An example for printing an error:
|
||||
|
||||
```php
|
||||
if ($e->hasColumnInfo()) {
|
||||
echo $e->getRawMessage() . ' from ' . $e->getStartLine() . ':' . $e->getStartColumn($code)
|
||||
. ' to ' . $e->getEndLine() . ':' . $e->getEndColumn($code);
|
||||
// or:
|
||||
echo $e->getMessageWithColumnInfo();
|
||||
} else {
|
||||
echo $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
Both line numbers and column numbers are 1-based. EOF errors will be located at the position one past the end of the
|
||||
file.
|
||||
|
||||
Error recovery
|
||||
--------------
|
||||
|
||||
The error behavior of the parser (and other components) is controlled by an `ErrorHandler`. Whenever an error is
|
||||
encountered, `ErrorHandler::handleError()` is invoked. The default error handling strategy is `ErrorHandler\Throwing`,
|
||||
which will immediately throw when an error is encountered.
|
||||
|
||||
To instead collect all encountered errors into an array, while trying to continue parsing the rest of the source code,
|
||||
an instance of `ErrorHandler\Collecting` can be passed to the `Parser::parse()` method. A usage example:
|
||||
|
||||
```php
|
||||
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7);
|
||||
$errorHandler = new PhpParser\ErrorHandler\Collecting;
|
||||
|
||||
$stmts = $parser->parse($code, $errorHandler);
|
||||
|
||||
if ($errorHandler->hasErrors()) {
|
||||
foreach ($errorHandler->getErrors() as $error) {
|
||||
// $error is an ordinary PhpParser\Error
|
||||
}
|
||||
}
|
||||
|
||||
if (null !== $stmts) {
|
||||
// $stmts is a best-effort partial AST
|
||||
}
|
||||
```
|
||||
|
||||
The `NameResolver` visitor also accepts an `ErrorHandler` as a constructor argument.
|
||||
@@ -1,152 +0,0 @@
|
||||
Lexer component documentation
|
||||
=============================
|
||||
|
||||
The lexer is responsible for providing tokens to the parser. The project comes with two lexers: `PhpParser\Lexer` and
|
||||
`PhpParser\Lexer\Emulative`. The latter is an extension of the former, which adds the ability to emulate tokens of
|
||||
newer PHP versions and thus allows parsing of new code on older versions.
|
||||
|
||||
This documentation discusses options available for the default lexers and explains how lexers can be extended.
|
||||
|
||||
Lexer options
|
||||
-------------
|
||||
|
||||
The two default lexers accept an `$options` array in the constructor. Currently only the `'usedAttributes'` option is
|
||||
supported, which allows you to specify which attributes will be added to the AST nodes. The attributes can then be
|
||||
accessed using `$node->getAttribute()`, `$node->setAttribute()`, `$node->hasAttribute()` and `$node->getAttributes()`
|
||||
methods. A sample options array:
|
||||
|
||||
```php
|
||||
$lexer = new PhpParser\Lexer(array(
|
||||
'usedAttributes' => array(
|
||||
'comments', 'startLine', 'endLine'
|
||||
)
|
||||
));
|
||||
```
|
||||
|
||||
The attributes used in this example match the default behavior of the lexer. The following attributes are supported:
|
||||
|
||||
* `comments`: Array of `PhpParser\Comment` or `PhpParser\Comment\Doc` instances, representing all comments that occurred
|
||||
between the previous non-discarded token and the current one. Use of this attribute is required for the
|
||||
`$node->getDocComment()` method to work. The attribute is also needed if you wish the pretty printer to retain
|
||||
comments present in the original code.
|
||||
* `startLine`: Line in which the node starts. This attribute is required for the `$node->getLine()` to work. It is also
|
||||
required if syntax errors should contain line number information.
|
||||
* `endLine`: Line in which the node ends.
|
||||
* `startTokenPos`: Offset into the token array of the first token in the node.
|
||||
* `endTokenPos`: Offset into the token array of the last token in the node.
|
||||
* `startFilePos`: Offset into the code string of the first character that is part of the node.
|
||||
* `endFilePos`: Offset into the code string of the last character that is part of the node.
|
||||
|
||||
### Using token positions
|
||||
|
||||
The token offset information is useful if you wish to examine the exact formatting used for a node. For example the AST
|
||||
does not distinguish whether a property was declared using `public` or using `var`, but you can retrieve this
|
||||
information based on the token position:
|
||||
|
||||
```php
|
||||
function isDeclaredUsingVar(array $tokens, PhpParser\Node\Stmt\Property $prop) {
|
||||
$i = $prop->getAttribute('startTokenPos');
|
||||
return $tokens[$i][0] === T_VAR;
|
||||
}
|
||||
```
|
||||
|
||||
In order to make use of this function, you will have to provide the tokens from the lexer to your node visitor using
|
||||
code similar to the following:
|
||||
|
||||
```php
|
||||
class MyNodeVisitor extends PhpParser\NodeVisitorAbstract {
|
||||
private $tokens;
|
||||
public function setTokens(array $tokens) {
|
||||
$this->tokens = $tokens;
|
||||
}
|
||||
|
||||
public function leaveNode(PhpParser\Node $node) {
|
||||
if ($node instanceof PhpParser\Node\Stmt\Property) {
|
||||
var_dump(isDeclaredUsingVar($this->tokens, $node));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
$lexer = new PhpParser\Lexer(array(
|
||||
'usedAttributes' => array(
|
||||
'comments', 'startLine', 'endLine', 'startTokenPos', 'endTokenPos'
|
||||
)
|
||||
));
|
||||
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, $lexer);
|
||||
|
||||
$visitor = new MyNodeVisitor();
|
||||
$traverser = new PhpParser\NodeTraverser();
|
||||
$traverser->addVisitor($visitor);
|
||||
|
||||
try {
|
||||
$stmts = $parser->parse($code);
|
||||
$visitor->setTokens($lexer->getTokens());
|
||||
$stmts = $traverser->traverse($stmts);
|
||||
} catch (PhpParser\Error $e) {
|
||||
echo 'Parse Error: ', $e->getMessage();
|
||||
}
|
||||
```
|
||||
|
||||
The same approach can also be used to perform specific modifications in the code, without changing the formatting in
|
||||
other places (which is the case when using the pretty printer).
|
||||
|
||||
Lexer extension
|
||||
---------------
|
||||
|
||||
A lexer has to define the following public interface:
|
||||
|
||||
void startLexing(string $code, ErrorHandler $errorHandler = null);
|
||||
array getTokens();
|
||||
string handleHaltCompiler();
|
||||
int getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null);
|
||||
|
||||
The `startLexing()` method is invoked with the source code that is to be lexed (including the opening tag) whenever the
|
||||
`parse()` method of the parser is called. It can be used to reset state or preprocess the source code or tokens. The
|
||||
passes `ErrorHandler` should be used to report lexing errors.
|
||||
|
||||
The `getTokens()` method returns the current token array, in the usual `token_get_all()` format. This method is not
|
||||
used by the parser (which uses `getNextToken()`), but is useful in combination with the token position attributes.
|
||||
|
||||
The `handleHaltCompiler()` method is called whenever a `T_HALT_COMPILER` token is encountered. It has to return the
|
||||
remaining string after the construct (not including `();`).
|
||||
|
||||
The `getNextToken()` method returns the ID of the next token (as defined by the `Parser::T_*` constants). If no more
|
||||
tokens are available it must return `0`, which is the ID of the `EOF` token. Furthermore the string content of the
|
||||
token should be written into the by-reference `$value` parameter (which will then be available as `$n` in the parser).
|
||||
|
||||
### Attribute handling
|
||||
|
||||
The other two by-ref variables `$startAttributes` and `$endAttributes` define which attributes will eventually be
|
||||
assigned to the generated nodes: The parser will take the `$startAttributes` from the first token which is part of the
|
||||
node and the `$endAttributes` from the last token that is part of the node.
|
||||
|
||||
E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the
|
||||
`T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token.
|
||||
|
||||
An application of custom attributes is storing the exact original formatting of literals: While the parser does retain
|
||||
some information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type), it
|
||||
does not preserve the exact original formatting (e.g. leading zeros for integers or escape sequences in strings). This
|
||||
can be remedied by storing the original value in an attribute:
|
||||
|
||||
```php
|
||||
use PhpParser\Lexer;
|
||||
use PhpParser\Parser\Tokens;
|
||||
|
||||
class KeepOriginalValueLexer extends Lexer // or Lexer\Emulative
|
||||
{
|
||||
public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) {
|
||||
$tokenId = parent::getNextToken($value, $startAttributes, $endAttributes);
|
||||
|
||||
if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string
|
||||
|| $tokenId == Tokens::T_ENCAPSED_AND_WHITESPACE // interpolated string
|
||||
|| $tokenId == Tokens::T_LNUMBER // integer
|
||||
|| $tokenId == Tokens::T_DNUMBER // floating point number
|
||||
) {
|
||||
// could also use $startAttributes, doesn't really matter here
|
||||
$endAttributes['originalValue'] = $value;
|
||||
}
|
||||
|
||||
return $tokenId;
|
||||
}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user