-
Notifications
You must be signed in to change notification settings - Fork 27
/
Copy pathcli.html
57 lines (56 loc) · 2.59 KB
/
cli.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="stylesheet" href="pdfsyntax.css">
<title>PDFSyntax</title>
</head>
<body>
<header>
<a href="https://pdfsyntax.dev"><img src="logo.svg" width="50"/></a>
</header>
<main>
<h2>CLI</h2>
<p></p>
<h3>Usage</h3>
<p>The general form of the CLI usage is:</p>
<pre> python3 -m pdfsyntax COMMAND FILE
</pre>
<p>You can get quick insights on a PDF file with these commands:</p>
<ul><li><code>overview</code> outputs text data about the structure and the metadata. </li>
<li><code>disasm</code> outputs a dump of the file structure on the terminal.</li>
<li><code>text</code> spatially extracts text content on all pages, as if it was a kind of scan.</li>
<li><code>browse</code> outputs static html data that lets you browse the internal structure of the PDF file: the PDF source is pretty-printed and augmented with hyperlinks.</li>
</ul>
<h3><code>overview</code></h3>
<p>The output shows information about:</p>
<ul><li>the structure : Version, Pages, Revisions, etc...</li>
<li>the metadata : Title, Author, Subject, etc...</li>
</ul>
<h3><code>disasm</code></h3>
<p>The output shows a terse and greppable view of the file internal structure.Please refer to the <a href='https://github.com/desgeeko/pdfsyntax/blob/main/docs/disassembler.md'>Disassembler article</a> for details.</p>
<h3><code>text</code></h3>
<p>The output shows a full extract of the text content, with a spatial awareness: the algorithm <em>tries</em> to respect the original layout, as if characters of all sizes were approximately rendered on a fixed-size grid.</p>
<h3><code>fonts</code></h3>
<p>The output shows a list of fonts used in the file, with the following tabular data:</p>
<ul><li>Name</li>
<li>Type</li>
<li>Encoding</li>
<li>Object number and generation number, comma separated</li>
<li>Number of pages where it occurs</li>
</ul>
<h3><code>browse</code></h3>
<p>This command generates HTML output that looks like the raw PDF file with additionnal hyperlinks and information that expose its internal structure and relations between its objects.Redirect the standard output to a file that you can open in your browser:</p>
<pre> python3 -m pdfsyntax browse file.pdf > inspection_file.html
</pre>
<p>Please refer to the <a href='https://github.com/desgeeko/pdfsyntax/blob/main/docs/browse.md'>Browse article</a> for details.</p>
<p></p>
<blockquote><p> TO BE CONTINUED</p>
</blockquote>
</main>
<footer>
© 2025 <a href="mailto:[email protected]">Martin D.</a> <[email protected]>
</footer>
</body>
</html>