Skip to content
This repository has been archived by the owner on Apr 17, 2018. It is now read-only.

Seeing error in getDomain #26

Open
keith-turner opened this issue Nov 10, 2015 · 4 comments
Open

Seeing error in getDomain #26

keith-turner opened this issue Nov 10, 2015 · 4 comments

Comments

@keith-turner
Copy link
Member

Repeatedly seeing the following error in getDomain.

16:46:24.087 [pool-10-thread-38] WARN  io.fluo.core.worker.WorkTask - Failed to execute observer CollisionFreeMapObserver notification : um:u:4 fluoRecipes cfm:um  153397
java.lang.RuntimeException: java.text.ParseException: Invalid host: whattoexpect.co.au
        at io.fluo.webindex.data.fluo.UriMap$UriUpdateObserver.getDomain(UriMap.java:148) ~[webindex-data-0.0.1-SNAPSHOT.jar:na]
        at io.fluo.webindex.data.fluo.UriMap$UriUpdateObserver.updatingValues(UriMap.java:135) ~[webindex-data-0.0.1-SNAPSHOT.jar:na]
        at io.fluo.recipes.map.CollisionFreeMap.process(CollisionFreeMap.java:203) ~[fluo-recipes-core-1.0.0-beta-1-SNAPSHOT.jar:1.0.0-beta-1-SNAPSHOT]
        at io.fluo.recipes.map.CollisionFreeMapObserver.process(CollisionFreeMapObserver.java:44) ~[fluo-recipes-core-1.0.0-beta-1-SNAPSHOT.jar:1.0.0-beta-1-SNAPSHOT]
        at io.fluo.core.worker.WorkTask.run(WorkTask.java:69) ~[fluo-core-1.0.0-beta-2-SNAPSHOT.jar:1.0.0-beta-2-SNAPSHOT]
        at io.fluo.core.worker.NotificationProcessor$2.run(NotificationProcessor.java:131) [fluo-core-1.0.0-beta-2-SNAPSHOT.jar:1.0.0-beta-2-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_51]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
Caused by: java.text.ParseException: Invalid host: whattoexpect.co.au
        at io.fluo.webindex.data.util.LinkUtil.createURL(LinkUtil.java:42) ~[webindex-data-0.0.1-SNAPSHOT.jar:na]
        at io.fluo.webindex.data.util.LinkUtil.getHost(LinkUtil.java:78) ~[webindex-data-0.0.1-SNAPSHOT.jar:na]
        at io.fluo.webindex.data.util.LinkUtil.hasIP(LinkUtil.java:87) ~[webindex-data-0.0.1-SNAPSHOT.jar:na]
        at io.fluo.webindex.data.util.LinkUtil.getReverseTopPrivate(LinkUtil.java:98) ~[webindex-data-0.0.1-SNAPSHOT.jar:na]
        at io.fluo.webindex.data.fluo.UriMap$UriUpdateObserver.getDomain(UriMap.java:146) ~[webindex-data-0.0.1-SNAPSHOT.jar:na]
        ... 8 common frames omitted

@keith-turner
Copy link
Member Author

This issue is caused by using different versions of guava in Spark vs Fluo. When the Spark job runs it uses guava 11 (because of yarn I think). With the older version of guava the domain is ok. When a Fluo Observer processes the data, a newer version of guava is in use. The newer version of guava no longer thinks the domain is ok.

@keith-turner
Copy link
Member Author

I experimented with making my webindex observers use guava 11 (which is what spark is using). However, the Fluo framework uses guava 13 which makes it difficult for an application to use guava 11. Dependency isolation (apache/fluo#536) would be nice.

keith-turner added a commit to keith-turner/webindex that referenced this issue Nov 17, 2015
mikewalch added a commit that referenced this issue Nov 17, 2015
#26 log error and continue when error getting domain
@keith-turner
Copy link
Member Author

Found the following site. We may be able to use the java software plus list from that site. The nice thing there is that the list is not built into the software like it is in guava.

https://publicsuffix.org/

@tfmorris
Copy link

Java lib that uses public suffix list: https://github.com/hamano/regdom4j/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants