Skip to content

roykim1990/house_price_r

Repository files navigation

house_price_r

A sample data science project that uses a Linear Regression model built in R to predict house price from the Ames Housing Data dataset. Specifically, this example is used to demonstrate the creating of ModelOp Center (MOC)-compliant code.

Local testing environment

To run locally, first make sure the R version and libraries match the training envorinment. The model was trained on R-4.2.1. To install the required packages, run

$ R -e 'install.packages("remotes", repos="http://cran.rstudio.com", dependencies=TRUE);'
$ R -e 'remotes::install_url(url="https://cran.r-project.org/src/contrib/Archive/readr/readr_1.3.0.tar.gz", dependencies=TRUE, upgrade=TRUE);'
$ R -e 'remotes::install_url(url="https://cran.r-project.org/src/contrib/Archive/tidymodels/tidymodels_0.1.4.tar.gz", dependencies=TRUE, upgrade=TRUE);'

Assets:

  • house_price.R is the R code that houses the MOC-compliant code to predict and get metrics on data.
  • trained_model.RData is the trained model artifact that is loaded upon prediction. In our case, the artifact is a workflow built on top of a recipe that includes a few data cleaning steps and a call to a linear regression model.
  • The datasets used for scoring are baseline.json and sample.json. These datasets represent raw data that would first be run into a batch scoring job. A sample of the outcome to the scoring job is provided in the output_action_sample.json file.
  • The datasets for metrics are baseline_scored.json and sample_scored.json. These datasets represent data that has appended the predictions from a scoring job. The column Sale_Price is renamed to ground_truth (not a necessary step).

Directions:

  1. For a scoring job, use the baseline.json or the sample.json files. The output is a JSON object that has the orignal Sale_Price and prediction for each input row.
  2. For a metrics job, use the baseline_scored.json or the sample_scored.json files. THe output is a list of the relevant metrics (RMSE, R2, MAE) for the regression model.

The input data to the scoring job is sample.json, which is a JSON-lines file (one-line JSON records). Here are the first two records:

{"MS_SubClass":"One_Story_1946_and_Newer_All_Styles","MS_Zoning":"Residential_Low_Density","Lot_Frontage":81,"Lot_Area":14267,"Street":"Pave","Alley":"No_Alley_Access","Lot_Shape":"Slightly_Irregular","Land_Contour":"Lvl","Utilities":"AllPub","Lot_Config":"Corner","Land_Slope":"Gtl","Neighborhood":"North_Ames","Condition_1":"Norm","Condition_2":"Norm","Bldg_Type":"OneFam","House_Style":"One_Story","Overall_Cond":"Above_Average","Year_Built":1958,"Year_Remod_Add":1958,"Roof_Style":"Hip","Roof_Matl":"CompShg","Exterior_1st":"Wd Sdng","Exterior_2nd":"Wd Sdng","Mas_Vnr_Type":"BrkFace","Mas_Vnr_Area":108,"Exter_Cond":"Typical","Foundation":"CBlock","Bsmt_Cond":"Typical","Bsmt_Exposure":"No","BsmtFin_Type_1":"ALQ","BsmtFin_SF_1":1,"BsmtFin_Type_2":"Unf","BsmtFin_SF_2":0,"Bsmt_Unf_SF":406,"Total_Bsmt_SF":1329,"Heating":"GasA","Heating_QC":"Typical","Central_Air":"Y","Electrical":"SBrkr","First_Flr_SF":1329,"Second_Flr_SF":0,"Gr_Liv_Area":1329,"Bsmt_Full_Bath":0,"Bsmt_Half_Bath":0,"Full_Bath":1,"Half_Bath":1,"Bedroom_AbvGr":3,"Kitchen_AbvGr":1,"TotRms_AbvGrd":6,"Functional":"Typ","Fireplaces":0,"Garage_Type":"Attchd","Garage_Finish":"Unf","Garage_Cars":1,"Garage_Area":312,"Garage_Cond":"Typical","Paved_Drive":"Paved","Wood_Deck_SF":393,"Open_Porch_SF":36,"Enclosed_Porch":0,"Three_season_porch":0,"Screen_Porch":0,"Pool_Area":0,"Pool_QC":"No_Pool","Fence":"No_Fence","Misc_Feature":"Gar2","Misc_Val":12500,"Mo_Sold":6,"Year_Sold":2010,"Sale_Type":"WD ","Sale_Condition":"Normal","Sale_Price":172000,"Longitude":-93.6194,"Latitude":42.0527,"Sale_Price_log":5.2355}
{"MS_SubClass":"One_Story_PUD_1946_and_Newer","MS_Zoning":"Residential_Low_Density","Lot_Frontage":39,"Lot_Area":5389,"Street":"Pave","Alley":"No_Alley_Access","Lot_Shape":"Slightly_Irregular","Land_Contour":"Lvl","Utilities":"AllPub","Lot_Config":"Inside","Land_Slope":"Gtl","Neighborhood":"Stone_Brook","Condition_1":"Norm","Condition_2":"Norm","Bldg_Type":"TwnhsE","House_Style":"One_Story","Overall_Cond":"Average","Year_Built":1995,"Year_Remod_Add":1996,"Roof_Style":"Gable","Roof_Matl":"CompShg","Exterior_1st":"CemntBd","Exterior_2nd":"CmentBd","Mas_Vnr_Type":"None","Mas_Vnr_Area":0,"Exter_Cond":"Typical","Foundation":"PConc","Bsmt_Cond":"Typical","Bsmt_Exposure":"No","BsmtFin_Type_1":"GLQ","BsmtFin_SF_1":3,"BsmtFin_Type_2":"Unf","BsmtFin_SF_2":0,"Bsmt_Unf_SF":415,"Total_Bsmt_SF":1595,"Heating":"GasA","Heating_QC":"Excellent","Central_Air":"Y","Electrical":"SBrkr","First_Flr_SF":1616,"Second_Flr_SF":0,"Gr_Liv_Area":1616,"Bsmt_Full_Bath":1,"Bsmt_Half_Bath":0,"Full_Bath":2,"Half_Bath":0,"Bedroom_AbvGr":2,"Kitchen_AbvGr":1,"TotRms_AbvGrd":5,"Functional":"Typ","Fireplaces":1,"Garage_Type":"Attchd","Garage_Finish":"RFn","Garage_Cars":2,"Garage_Area":608,"Garage_Cond":"Typical","Paved_Drive":"Paved","Wood_Deck_SF":237,"Open_Porch_SF":152,"Enclosed_Porch":0,"Three_season_porch":0,"Screen_Porch":0,"Pool_Area":0,"Pool_QC":"No_Pool","Fence":"No_Fence","Misc_Feature":"None","Misc_Val":0,"Mo_Sold":3,"Year_Sold":2010,"Sale_Type":"WD ","Sale_Condition":"Normal","Sale_Price":236500,"Longitude":-93.6329,"Latitude":42.0611,"Sale_Price_log":5.3738}

The input data to the metrics job is sample_scored.json, which is a JSON-lines file (one-line JSON records). Here are the first two records:

{"ground_truth":172000,"prediction":143116.3251,"MS_SubClass":"One_Story_1946_and_Newer_All_Styles","MS_Zoning":"Residential_Low_Density","Lot_Frontage":81,"Lot_Area":14267,"Street":"Pave","Alley":"No_Alley_Access","Lot_Shape":"Slightly_Irregular","Land_Contour":"Lvl","Utilities":"AllPub","Lot_Config":"Corner","Land_Slope":"Gtl","Neighborhood":"North_Ames","Condition_1":"Norm","Condition_2":"Norm","Bldg_Type":"OneFam","House_Style":"One_Story","Overall_Cond":"Above_Average","Year_Built":1958,"Year_Remod_Add":1958,"Roof_Style":"Hip","Roof_Matl":"CompShg","Exterior_1st":"Wd Sdng","Exterior_2nd":"Wd Sdng","Mas_Vnr_Type":"BrkFace","Mas_Vnr_Area":108,"Exter_Cond":"Typical","Foundation":"CBlock","Bsmt_Cond":"Typical","Bsmt_Exposure":"No","BsmtFin_Type_1":"ALQ","BsmtFin_SF_1":1,"BsmtFin_Type_2":"Unf","BsmtFin_SF_2":0,"Bsmt_Unf_SF":406,"Total_Bsmt_SF":1329,"Heating":"GasA","Heating_QC":"Typical","Central_Air":"Y","Electrical":"SBrkr","First_Flr_SF":1329,"Second_Flr_SF":0,"Gr_Liv_Area":1329,"Bsmt_Full_Bath":0,"Bsmt_Half_Bath":0,"Full_Bath":1,"Half_Bath":1,"Bedroom_AbvGr":3,"Kitchen_AbvGr":1,"TotRms_AbvGrd":6,"Functional":"Typ","Fireplaces":0,"Garage_Type":"Attchd","Garage_Finish":"Unf","Garage_Cars":1,"Garage_Area":312,"Garage_Cond":"Typical","Paved_Drive":"Paved","Wood_Deck_SF":393,"Open_Porch_SF":36,"Enclosed_Porch":0,"Three_season_porch":0,"Screen_Porch":0,"Pool_Area":0,"Pool_QC":"No_Pool","Fence":"No_Fence","Misc_Feature":"Gar2","Misc_Val":12500,"Mo_Sold":6,"Year_Sold":2010,"Sale_Type":"WD ","Sale_Condition":"Normal","Longitude":-93.6194,"Latitude":42.0527,"Sale_Price_log":5.2355}
{"ground_truth":236500,"prediction":254932.8695,"MS_SubClass":"One_Story_PUD_1946_and_Newer","MS_Zoning":"Residential_Low_Density","Lot_Frontage":39,"Lot_Area":5389,"Street":"Pave","Alley":"No_Alley_Access","Lot_Shape":"Slightly_Irregular","Land_Contour":"Lvl","Utilities":"AllPub","Lot_Config":"Inside","Land_Slope":"Gtl","Neighborhood":"Stone_Brook","Condition_1":"Norm","Condition_2":"Norm","Bldg_Type":"TwnhsE","House_Style":"One_Story","Overall_Cond":"Average","Year_Built":1995,"Year_Remod_Add":1996,"Roof_Style":"Gable","Roof_Matl":"CompShg","Exterior_1st":"CemntBd","Exterior_2nd":"CmentBd","Mas_Vnr_Type":"None","Mas_Vnr_Area":0,"Exter_Cond":"Typical","Foundation":"PConc","Bsmt_Cond":"Typical","Bsmt_Exposure":"No","BsmtFin_Type_1":"GLQ","BsmtFin_SF_1":3,"BsmtFin_Type_2":"Unf","BsmtFin_SF_2":0,"Bsmt_Unf_SF":415,"Total_Bsmt_SF":1595,"Heating":"GasA","Heating_QC":"Excellent","Central_Air":"Y","Electrical":"SBrkr","First_Flr_SF":1616,"Second_Flr_SF":0,"Gr_Liv_Area":1616,"Bsmt_Full_Bath":1,"Bsmt_Half_Bath":0,"Full_Bath":2,"Half_Bath":0,"Bedroom_AbvGr":2,"Kitchen_AbvGr":1,"TotRms_AbvGrd":5,"Functional":"Typ","Fireplaces":1,"Garage_Type":"Attchd","Garage_Finish":"RFn","Garage_Cars":2,"Garage_Area":608,"Garage_Cond":"Typical","Paved_Drive":"Paved","Wood_Deck_SF":237,"Open_Porch_SF":152,"Enclosed_Porch":0,"Three_season_porch":0,"Screen_Porch":0,"Pool_Area":0,"Pool_QC":"No_Pool","Fence":"No_Fence","Misc_Feature":"None","Misc_Val":0,"Mo_Sold":3,"Year_Sold":2010,"Sale_Type":"WD ","Sale_Condition":"Normal","Longitude":-93.6329,"Latitude":42.0611,"Sale_Price_log":5.3738}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages