|
| 1 | +--- |
| 2 | +RFC: '0055' |
| 3 | +Author: Josh Rickard |
| 4 | +Status: Rejected |
| 5 | +SupercededBy: |
| 6 | +Version: 1.0 |
| 7 | +Area: PowerShell Core Web CmdLets |
| 8 | +Comments Due: August 31st, 2018 |
| 9 | +Plan to implement: |
| 10 | +--- |
| 11 | + |
| 12 | +# ConvertFrom-Html |
| 13 | + |
| 14 | +The proposal is to create a new 'ConvertFrom-Html' cmdlet that will convert Html strings using PowerShell core. |
| 15 | + |
| 16 | +Currently the PowerShell Core Web CmdLets do not have access to the `HtmlWebResponseObject` and currently only contains the `BasicHtmlWebResponseObject` type. Because of this, the capability to parse HTML using the `ParsedHtml` property of the `HtmlWebResponseObject` type does not exist within PowerShell Core. |
| 17 | + |
| 18 | +Windows PowerShell does contain the `HtmlWebResponseObject`, but PowerShell Core currently only contains the `BasicHtmlWebResponseObject` type. |
| 19 | + |
| 20 | +Additionally, Windows PowerShell Web CmdLets utilize Internet Explorer to parse HTML content. Since non-Windows systems do not have Internet Explorer, PowerShell Core utilizes the `BasicHtmlWebResponseObject` which does not contain this property. |
| 21 | + |
| 22 | +This RFC proposes that the creation of a new CmdLet named `ConvertFrom-Html`. This CmdLet is to be implemented into PowerShell Core and should utilize the [AngelSharp](https://github.com/AngleSharp/AngleSharp) framework for converting HTML strings into a PSCustomObject. |
| 23 | + |
| 24 | +## Motivation |
| 25 | + |
| 26 | +As a PowerShell Core user, I can convert HTML content to objects so that I can easily work with downloaded or local HTML content. |
| 27 | + |
| 28 | +As a IT Administrator, I can call `Invoke-WebRequest` and then use `ConvertFrom-Html` to convert the `Content` of my Web Request to a PSCustomObject so that I can easily work with HTML strings/content. |
| 29 | + |
| 30 | +As a IT Administrator, I can call `Invoke-WebRequest` and then use `ConvertFrom-Html` to convert the `Content` of my Web Request to a PSCustomObject so that I can easily convert it to another type (json, csv, xml, etc.). |
| 31 | + |
| 32 | +As a IT Administrator, I can pipe a string into `ConvertFrom-Html` to convert it to a PSCustomObject so that I can easily convert it to another type, modify the object, and use the `ConvertTo-Html` CmdLet to convert it back to Html. |
| 33 | + |
| 34 | +## Specification |
| 35 | + |
| 36 | +- InputObject parameter |
| 37 | + - Specifies the HTML strings to convert to PSCustomObject objects. Enter a variable that contains the string, or type a command or expression that gets the string. You can also pipe a string to ConvertFrom-Html. |
| 38 | + - The InputObject parameter is required, but its value can be an empty string. When the input object is an empty string, ConvertFrom-Html does not generate any output. The InputObject value cannot be $Null. |
| 39 | + |
| 40 | +### Syntax |
| 41 | + |
| 42 | +```text |
| 43 | +ConvertFrom-Html [-InputObject] <String> [<CommonParameters>] |
| 44 | +``` |
| 45 | + |
| 46 | +### PARAMETERS |
| 47 | + |
| 48 | +#### -InputObject |
| 49 | + |
| 50 | +Specifies the HTML strings to convert to HTML objects. |
| 51 | +Enter a variable that contains the string, or type a command or expression that gets the string. |
| 52 | +You can also pipe a string to **ConvertFrom-Html**. |
| 53 | + |
| 54 | +The *InputObject* parameter is required, but its value can be an empty string. |
| 55 | +When the input object is an empty string, **ConvertFrom-Html** does not generate any output. |
| 56 | +The *InputObject* value cannot be $Null. |
| 57 | + |
| 58 | +```yaml |
| 59 | +Type: String |
| 60 | +Parameter Sets: (All) |
| 61 | +Aliases: |
| 62 | + |
| 63 | +Required: True |
| 64 | +Position: 0 |
| 65 | +Default value: None |
| 66 | +Accept pipeline input: True (ByValue) |
| 67 | +Accept wildcard characters: False |
| 68 | +``` |
| 69 | +
|
| 70 | +### INPUTS |
| 71 | +
|
| 72 | +#### System.String |
| 73 | +
|
| 74 | +You can pipe a HTML string to **ConvertFrom-Html**. |
| 75 | +
|
| 76 | +### OUTPUTS |
| 77 | +
|
| 78 | +#### PSCustomObject |
| 79 | +
|
| 80 | +### Examples |
| 81 | +
|
| 82 | +You can provide a string to convert to a Html object using the `InputObject` Parameter |
| 83 | + |
| 84 | +```powershell |
| 85 | +ConvertFrom-Html -InputObject $InvokeWebRequestObject |
| 86 | +``` |
| 87 | + |
| 88 | +You can provide a string to convert to a Html object using Position 0 (`InputObject`) parameterization: |
| 89 | + |
| 90 | +```powershell |
| 91 | +ConvertFrom-Html $InvokeWebRequestObject |
| 92 | +``` |
| 93 | + |
| 94 | +You can pipe a string into `ConvertFrom-Html`: |
| 95 | + |
| 96 | +```powershell |
| 97 | +$htmlString = @" |
| 98 | +<HTML> |
| 99 | +
|
| 100 | +<HEAD> |
| 101 | + <TITLE>Your Title Here</TITLE> |
| 102 | +</HEAD> |
| 103 | +
|
| 104 | +<BODY BGCOLOR="FFFFFF"> |
| 105 | + <CENTER> |
| 106 | + <IMG SRC="clouds.jpg" ALIGN="BOTTOM"> </CENTER> |
| 107 | + <HR> |
| 108 | + <a href="http://somegreatsite.com">Link Name</a> |
| 109 | + is a link to another nifty site |
| 110 | + <H1>This is a Header</H1> |
| 111 | + <H2>This is a Medium Header</H2> |
| 112 | + Send me mail at |
| 113 | + <a href="mailto:[email protected]"> |
| 114 | + |
| 115 | + <P> This is a new paragraph! |
| 116 | + <P> |
| 117 | + <B>This is a new paragraph!</B> |
| 118 | + <BR> |
| 119 | + <B> |
| 120 | + <I>This is a new sentence without a paragraph break, in bold italics.</I> |
| 121 | + </B> |
| 122 | + <HR> |
| 123 | +</BODY> |
| 124 | +
|
| 125 | +</HTML> |
| 126 | +"@ |
| 127 | +
|
| 128 | +$htmlObject = $htmlString | ConvertFrom-Html |
| 129 | +``` |
| 130 | + |
| 131 | +Advanced example using Invoke-WebRequest and converting the returned content to a PSCustomObject. |
| 132 | + |
| 133 | +```powershell |
| 134 | +$dnsDumpsterURL = 'https://dnsdumpster.com/' |
| 135 | +$dumpsterRequest = Invoke-WebRequest -Uri $dnsDumpsterURL -SessionVariable session |
| 136 | +
|
| 137 | +$props = @{ |
| 138 | + Uri = $dnsDumpsterURL |
| 139 | + Headers = @{Referer = $dnsDumpsterURL; 'Content-Type' = 'application / x-www-form-urlencoded'} |
| 140 | + WebSession = $session |
| 141 | + Body = @{ |
| 142 | + 'csrfmiddlewaretoken' = $dumpsterRequest.InputFields.value; |
| 143 | + 'targetip' = 'microsoft.com' |
| 144 | + } |
| 145 | + Method = 'Post' |
| 146 | +} |
| 147 | +
|
| 148 | +$dnsDumpsterObject = Invoke-WebRequest @props | ConvertFrom-Html |
| 149 | +``` |
| 150 | + |
| 151 | +## Alternate Proposals and Considerations |
| 152 | + |
| 153 | +Some considerations to keep in mind: |
| 154 | + |
| 155 | +- Converted Html may be piped to any number of CmdLets. For example, ConverTo-Json, ConverTo-Csv, ConvertTo-Xml, and ConvertTo-Html |
| 156 | +- Based on conversations in #3267 and #2867, this CmdLet should use AngleSharp to parse Html strings and output a PSCustomObject |
| 157 | +- We should support the same platforms that PowerShell is supported on: Win32, Ubuntu 14/16, CentOS7, MacOS10. |
0 commit comments