-
Notifications
You must be signed in to change notification settings - Fork 0
Supporting Migration from XML Package
The XML package was developed by Duncan Temple Lang, with maintenance taken over by the CRAN Team in 2013 when he was no longer able to maintain it. The CRAN Team hoped that in time, packages depending on XML would migrate to alternatives, however this didn't happen. Following a recent call for volunteers, Ivan Krylov has taken over maintenance of XML. However, XML will remain in "maintenance mode" and package authors are encouraged to switch to xml2, which is being actively developed and has a strong development team.
This project will support this effort by contributing patches to packages depending on XML, implementing the switch to xml2. In addition, example mappings from XML to xml2 code will be documented, to help package authors make the switch themselves.
Daniel Nüst documented his approach and learnings in switching from XML to xml2 in the sos4R package. In his exploration of CRAN maintained packages Lluís Revilla Sancho considered packages that depend on XML, e.g., identifying important packages that depend on XML and also packages that wouldn't work without XML.
The project would identify one or more packages that could migrate from XML. After contacting the maintainer to ensure a patch is welcome, the package code would be updated to migrate from XML - most likely switching to xml2, but potentially new code could be written to remove the dependency.
A good initial candidate may be a package that depends on a small number of XML functions. A search for "importFrom(XML" on GitHub reveals that some packages appear to use only one function from XML.
After successfully contributing a patch for a simple case, more complex cases could be taken on.
An appropriate package or number of packages could be selected to make this a small, medium, or large GSoC project.
The number of packages depending on XML will be reduced, reducing the maintenance burden for this package which has not been under active development for over a decade.
- EVALUATING MENTOR: Heather Turner [email protected] chairs the R Contribution Working Group and is an author of several CRAN packages, notably the statistical modelling packages gnm, BradleyTerry2 and PlackettLuce. She was a GSoC co-mentor in 2021-2023.
- gwynn gebeyehu [email protected] has a PhD in statistics from the University of Auckland (the birthplace of R). She completed a post-doc at Harvard Business School, and is involved in the R Contribution Working Group and other parts of the R community.
Contributors, please do one or more of the following tests before contacting the mentors above.
-
Easy: Using the example data set in Section 41.1.3 of Computing with Data, write an R script using functions from the xml2 package to extract the name of the director of the movie "Y tu mama tambien".
-
Medium: Replicate as much as you can of the analysis in Section 41.1.3 using functions from the XML package.
-
Hard: Given the XML code
z <- ' <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE>10.90</PRICE> <YEAR>1985</YEAR> </CD> <CD> <TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tylor</ARTIST> <COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY> <PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD> </CATALOG> '
Write an R function that reproduces the output of
res <- rlist::list.parse(z, type='xml')
that does not depend on the XML package. Ideally, the output of your function should exactly match the structure ofres
, i.e. pass the testidentical(res, res2)
, whereres2
is the output of your function.
Contributors, please post a link to your test results here.
-
Tushar Banik, Github Profile, Analysis, Easy Test solution, Medium Test solution, Hard Test solution
-
Caroline Guerra: easy solution: https://github.com/Caroline-Guerra/GSoC-2024/blob/main/Easy%20Solution.pdf, medium solution: https://github.com/Caroline-Guerra/GSoC-2024/blob/main/Medium%20Solution.pdf, hard solution: https://github.com/Caroline-Guerra/GSoC-2024/blob/main/Hard%20Solution.pdf