Skip to content

Commit 26ca1f2

Browse files
MSAdministratorrickardjajoeyaiello
authored
Reject ConvertFrom-Html (RFC0055) (#137)
* Submitting ConvertFrom-Html RFC * Added proposal statement to the top of RFC * Removed output from BasicHtmlWebResponseObject * Clarified output to be PSCustomObject in Motivation Section * Remove proposal and modified parameter definition in Specification section * Removed CommonParameters from RFC * Modified example to accept parameter binding of WebRequest object - Content property * Modified Alternate Proposals to clarify output is PSCustomObject * Prepare RFC0055 - ConvertFrom-Html for rejection Co-authored-by: Josh Rickard <[email protected]> Co-authored-by: Joey Aiello <[email protected]>
1 parent 983572c commit 26ca1f2

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed
+157
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
RFC: '0055'
3+
Author: Josh Rickard
4+
Status: Rejected
5+
SupercededBy:
6+
Version: 1.0
7+
Area: PowerShell Core Web CmdLets
8+
Comments Due: August 31st, 2018
9+
Plan to implement:
10+
---
11+
12+
# ConvertFrom-Html
13+
14+
The proposal is to create a new 'ConvertFrom-Html' cmdlet that will convert Html strings using PowerShell core.
15+
16+
Currently the PowerShell Core Web CmdLets do not have access to the `HtmlWebResponseObject` and currently only contains the `BasicHtmlWebResponseObject` type. Because of this, the capability to parse HTML using the `ParsedHtml` property of the `HtmlWebResponseObject` type does not exist within PowerShell Core.
17+
18+
Windows PowerShell does contain the `HtmlWebResponseObject`, but PowerShell Core currently only contains the `BasicHtmlWebResponseObject` type.
19+
20+
Additionally, Windows PowerShell Web CmdLets utilize Internet Explorer to parse HTML content. Since non-Windows systems do not have Internet Explorer, PowerShell Core utilizes the `BasicHtmlWebResponseObject` which does not contain this property.
21+
22+
This RFC proposes that the creation of a new CmdLet named `ConvertFrom-Html`. This CmdLet is to be implemented into PowerShell Core and should utilize the [AngelSharp](https://github.com/AngleSharp/AngleSharp) framework for converting HTML strings into a PSCustomObject.
23+
24+
## Motivation
25+
26+
As a PowerShell Core user, I can convert HTML content to objects so that I can easily work with downloaded or local HTML content.
27+
28+
As a IT Administrator, I can call `Invoke-WebRequest` and then use `ConvertFrom-Html` to convert the `Content` of my Web Request to a PSCustomObject so that I can easily work with HTML strings/content.
29+
30+
As a IT Administrator, I can call `Invoke-WebRequest` and then use `ConvertFrom-Html` to convert the `Content` of my Web Request to a PSCustomObject so that I can easily convert it to another type (json, csv, xml, etc.).
31+
32+
As a IT Administrator, I can pipe a string into `ConvertFrom-Html` to convert it to a PSCustomObject so that I can easily convert it to another type, modify the object, and use the `ConvertTo-Html` CmdLet to convert it back to Html.
33+
34+
## Specification
35+
36+
- InputObject parameter
37+
- Specifies the HTML strings to convert to PSCustomObject objects. Enter a variable that contains the string, or type a command or expression that gets the string. You can also pipe a string to ConvertFrom-Html.
38+
- The InputObject parameter is required, but its value can be an empty string. When the input object is an empty string, ConvertFrom-Html does not generate any output. The InputObject value cannot be $Null.
39+
40+
### Syntax
41+
42+
```text
43+
ConvertFrom-Html [-InputObject] <String> [<CommonParameters>]
44+
```
45+
46+
### PARAMETERS
47+
48+
#### -InputObject
49+
50+
Specifies the HTML strings to convert to HTML objects.
51+
Enter a variable that contains the string, or type a command or expression that gets the string.
52+
You can also pipe a string to **ConvertFrom-Html**.
53+
54+
The *InputObject* parameter is required, but its value can be an empty string.
55+
When the input object is an empty string, **ConvertFrom-Html** does not generate any output.
56+
The *InputObject* value cannot be $Null.
57+
58+
```yaml
59+
Type: String
60+
Parameter Sets: (All)
61+
Aliases:
62+
63+
Required: True
64+
Position: 0
65+
Default value: None
66+
Accept pipeline input: True (ByValue)
67+
Accept wildcard characters: False
68+
```
69+
70+
### INPUTS
71+
72+
#### System.String
73+
74+
You can pipe a HTML string to **ConvertFrom-Html**.
75+
76+
### OUTPUTS
77+
78+
#### PSCustomObject
79+
80+
### Examples
81+
82+
You can provide a string to convert to a Html object using the `InputObject` Parameter
83+
84+
```powershell
85+
ConvertFrom-Html -InputObject $InvokeWebRequestObject
86+
```
87+
88+
You can provide a string to convert to a Html object using Position 0 (`InputObject`) parameterization:
89+
90+
```powershell
91+
ConvertFrom-Html $InvokeWebRequestObject
92+
```
93+
94+
You can pipe a string into `ConvertFrom-Html`:
95+
96+
```powershell
97+
$htmlString = @"
98+
<HTML>
99+
100+
<HEAD>
101+
<TITLE>Your Title Here</TITLE>
102+
</HEAD>
103+
104+
<BODY BGCOLOR="FFFFFF">
105+
<CENTER>
106+
<IMG SRC="clouds.jpg" ALIGN="BOTTOM"> </CENTER>
107+
<HR>
108+
<a href="http://somegreatsite.com">Link Name</a>
109+
is a link to another nifty site
110+
<H1>This is a Header</H1>
111+
<H2>This is a Medium Header</H2>
112+
Send me mail at
113+
<a href="mailto:[email protected]">
114+
115+
<P> This is a new paragraph!
116+
<P>
117+
<B>This is a new paragraph!</B>
118+
<BR>
119+
<B>
120+
<I>This is a new sentence without a paragraph break, in bold italics.</I>
121+
</B>
122+
<HR>
123+
</BODY>
124+
125+
</HTML>
126+
"@
127+
128+
$htmlObject = $htmlString | ConvertFrom-Html
129+
```
130+
131+
Advanced example using Invoke-WebRequest and converting the returned content to a PSCustomObject.
132+
133+
```powershell
134+
$dnsDumpsterURL = 'https://dnsdumpster.com/'
135+
$dumpsterRequest = Invoke-WebRequest -Uri $dnsDumpsterURL -SessionVariable session
136+
137+
$props = @{
138+
Uri = $dnsDumpsterURL
139+
Headers = @{Referer = $dnsDumpsterURL; 'Content-Type' = 'application / x-www-form-urlencoded'}
140+
WebSession = $session
141+
Body = @{
142+
'csrfmiddlewaretoken' = $dumpsterRequest.InputFields.value;
143+
'targetip' = 'microsoft.com'
144+
}
145+
Method = 'Post'
146+
}
147+
148+
$dnsDumpsterObject = Invoke-WebRequest @props | ConvertFrom-Html
149+
```
150+
151+
## Alternate Proposals and Considerations
152+
153+
Some considerations to keep in mind:
154+
155+
- Converted Html may be piped to any number of CmdLets. For example, ConverTo-Json, ConverTo-Csv, ConvertTo-Xml, and ConvertTo-Html
156+
- Based on conversations in #3267 and #2867, this CmdLet should use AngleSharp to parse Html strings and output a PSCustomObject
157+
- We should support the same platforms that PowerShell is supported on: Win32, Ubuntu 14/16, CentOS7, MacOS10.

0 commit comments

Comments
 (0)