diff --git a/DESCRIPTION b/DESCRIPTION index 293c609..8180774 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,26 +1,18 @@ -Package: muHVT +Package: HVT Type: Package -Date: 2023-07-07 +Date: 2023-10-15 Title: Constructing Hierarchical Voronoi Tessellations and Overlay Heatmap for Data Analysis -Version: (v23.06.07) +Version: 3.0.1 Authors@R: c( - person("Zubin", "Dowlaty", email = "zubin.dowlaty@mu-sigma.com", role = "aut"), - person("Shubhra", "Prakash", email = "shubhraprakash@live.com", role = "ctb"), - person("Sangeet Moy", "Das", email = "dassangeet768@gmail.com", role = "ctb"), - person("Sunuganty Achyut", "Raj", email = "achyut.raj92@gmail.com", role = "ctb"), - person("Shantanu", "Vaidya", email = "snv.shantanu@gmail.com", role = "ctb"), - person("Somya", "Shambhawi", email = "shambhawisomya@gmail.com", role = "ctb") - person("Praditi", "Shah", email = "shahpraditi@gmail.com", role = "ctb"), - person("Avinash", "Joshi", email = "avinash.joshi@mu-sigma.com", role = "ctb"), - person("Meet", "Dave", email = "meetdave06@gmail.com", role = "ctb"), - person("Mu Sigma, Inc.", email = "ird.experiencelab@mu-sigma.com", role = "cre")) -Description: The muHVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data.See for more information. Credits to Mu Sigma for their continuous support throughout the development of the package. + person("Zubin", "Dowlaty", email = "zubin.dowlaty@mu-sigma.com", role = "aut"), + person("Mu Sigma, Inc.", email = "ird.experiencelab@mu-sigma.com", role = "cre")) +Description: The HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data.See for more information. Credits to Mu Sigma for their continuous support throughout the development of the package. License: Apache License 2.0 Encoding: UTF-8 Imports: MASS, deldir, grDevices, splancs, sp, conf.design, Hmisc, - stats, dplyr, purrr, magrittr, polyclip, - ggplot2, tidyr, scales, cluster, reshape2, plyr + stats, dplyr, purrr, magrittr, polyclip, ggplot2, tidyr, + scales, cluster, reshape2, plyr Depends: R (>= 3.6.0) BugReports: https://github.com/Mu-Sigma/muHVT/issues URL: https://github.com/Mu-Sigma/muHVT @@ -29,17 +21,9 @@ Suggests: knitr, rmarkdown, testthat, geozoo, kableExtra, plotly, data.table VignetteBuilder: knitr NeedsCompilation: no -Packaged: 2023-07-07 17:45:51 UTC; somya +Packaged: 2023-10-15 12:35:23 UTC; ponanureka Author: Zubin Dowlaty [aut], - Shubhra Prakash [ctb], - Sangeet Moy Das [ctb], - Sunuganty Achyut Raj [ctb], - Shantanu Vaidya [ctb], - Praditi Shah [ctb], - Avinash Joshi [ctb], - Somya Shambhawi [ctb] - Meet Dave [ctb], - Mu Sigma, Inc. [cre] + Mu Sigma, Inc. [cre] Maintainer: "Mu Sigma, Inc." Repository: CRAN -Date/Publication: 2023-07-07 20:10:03 UTC +Date/Publication: 2023-10-15 20:10:03 UTC diff --git a/muHVT.Rproj b/HVT.Rproj similarity index 100% rename from muHVT.Rproj rename to HVT.Rproj diff --git a/HVT_3.0.1.tar.gz b/HVT_3.0.1.tar.gz new file mode 100644 index 0000000..f646977 Binary files /dev/null and b/HVT_3.0.1.tar.gz differ diff --git a/README.html b/README.html index db413ae..7d2a857 100644 --- a/README.html +++ b/README.html @@ -30,7 +30,7 @@ !function(e,t){"use strict";"object"==typeof module&&"object"==typeof module.exports?module.exports=e.document?t(e,!0):function(e){if(!e.document)throw new Error("jQuery requires a window with a document");return t(e)}:t(e)}("undefined"!=typeof window?window:this,function(C,e){"use strict";var t=[],r=Object.getPrototypeOf,s=t.slice,g=t.flat?function(e){return t.flat.call(e)}:function(e){return t.concat.apply([],e)},u=t.push,i=t.indexOf,n={},o=n.toString,v=n.hasOwnProperty,a=v.toString,l=a.call(Object),y={},m=function(e){return"function"==typeof e&&"number"!=typeof e.nodeType&&"function"!=typeof e.item},x=function(e){return null!=e&&e===e.window},E=C.document,c={type:!0,src:!0,nonce:!0,noModule:!0};function b(e,t,n){var r,i,o=(n=n||E).createElement("script");if(o.text=e,t)for(r in c)(i=t[r]||t.getAttribute&&t.getAttribute(r))&&o.setAttribute(r,i);n.head.appendChild(o).parentNode.removeChild(o)}function w(e){return null==e?e+"":"object"==typeof e||"function"==typeof e?n[o.call(e)]||"object":typeof e}var f="3.6.0",S=function(e,t){return new S.fn.init(e,t)};function p(e){var t=!!e&&"length"in e&&e.length,n=w(e);return!m(e)&&!x(e)&&("array"===n||0===t||"number"==typeof t&&0+~]|"+M+")"+M+"*"),U=new RegExp(M+"|>"),X=new RegExp(F),V=new RegExp("^"+I+"$"),G={ID:new RegExp("^#("+I+")"),CLASS:new RegExp("^\\.("+I+")"),TAG:new RegExp("^("+I+"|[*])"),ATTR:new RegExp("^"+W),PSEUDO:new RegExp("^"+F),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\("+M+"*(even|odd|(([+-]|)(\\d*)n|)"+M+"*(?:([+-]|)"+M+"*(\\d+)|))"+M+"*\\)|)","i"),bool:new RegExp("^(?:"+R+")$","i"),needsContext:new RegExp("^"+M+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\("+M+"*((?:-\\d)?\\d*)"+M+"*\\)|)(?=[^-]|$)","i")},Y=/HTML$/i,Q=/^(?:input|select|textarea|button)$/i,J=/^h\d$/i,K=/^[^{]+\{\s*\[native \w/,Z=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,ee=/[+~]/,te=new RegExp("\\\\[\\da-fA-F]{1,6}"+M+"?|\\\\([^\\r\\n\\f])","g"),ne=function(e,t){var n="0x"+e.slice(1)-65536;return t||(n<0?String.fromCharCode(n+65536):String.fromCharCode(n>>10|55296,1023&n|56320))},re=/([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g,ie=function(e,t){return t?"\0"===e?"\ufffd":e.slice(0,-1)+"\\"+e.charCodeAt(e.length-1).toString(16)+" ":"\\"+e},oe=function(){T()},ae=be(function(e){return!0===e.disabled&&"fieldset"===e.nodeName.toLowerCase()},{dir:"parentNode",next:"legend"});try{H.apply(t=O.call(p.childNodes),p.childNodes),t[p.childNodes.length].nodeType}catch(e){H={apply:t.length?function(e,t){L.apply(e,O.call(t))}:function(e,t){var n=e.length,r=0;while(e[n++]=t[r++]);e.length=n-1}}}function se(t,e,n,r){var i,o,a,s,u,l,c,f=e&&e.ownerDocument,p=e?e.nodeType:9;if(n=n||[],"string"!=typeof t||!t||1!==p&&9!==p&&11!==p)return n;if(!r&&(T(e),e=e||C,E)){if(11!==p&&(u=Z.exec(t)))if(i=u[1]){if(9===p){if(!(a=e.getElementById(i)))return n;if(a.id===i)return n.push(a),n}else if(f&&(a=f.getElementById(i))&&y(e,a)&&a.id===i)return n.push(a),n}else{if(u[2])return H.apply(n,e.getElementsByTagName(t)),n;if((i=u[3])&&d.getElementsByClassName&&e.getElementsByClassName)return H.apply(n,e.getElementsByClassName(i)),n}if(d.qsa&&!N[t+" "]&&(!v||!v.test(t))&&(1!==p||"object"!==e.nodeName.toLowerCase())){if(c=t,f=e,1===p&&(U.test(t)||z.test(t))){(f=ee.test(t)&&ye(e.parentNode)||e)===e&&d.scope||((s=e.getAttribute("id"))?s=s.replace(re,ie):e.setAttribute("id",s=S)),o=(l=h(t)).length;while(o--)l[o]=(s?"#"+s:":scope")+" "+xe(l[o]);c=l.join(",")}try{return H.apply(n,f.querySelectorAll(c)),n}catch(e){N(t,!0)}finally{s===S&&e.removeAttribute("id")}}}return g(t.replace($,"$1"),e,n,r)}function ue(){var r=[];return function e(t,n){return r.push(t+" ")>b.cacheLength&&delete e[r.shift()],e[t+" "]=n}}function le(e){return e[S]=!0,e}function ce(e){var t=C.createElement("fieldset");try{return!!e(t)}catch(e){return!1}finally{t.parentNode&&t.parentNode.removeChild(t),t=null}}function fe(e,t){var n=e.split("|"),r=n.length;while(r--)b.attrHandle[n[r]]=t}function pe(e,t){var n=t&&e,r=n&&1===e.nodeType&&1===t.nodeType&&e.sourceIndex-t.sourceIndex;if(r)return r;if(n)while(n=n.nextSibling)if(n===t)return-1;return e?1:-1}function de(t){return function(e){return"input"===e.nodeName.toLowerCase()&&e.type===t}}function he(n){return function(e){var t=e.nodeName.toLowerCase();return("input"===t||"button"===t)&&e.type===n}}function ge(t){return function(e){return"form"in e?e.parentNode&&!1===e.disabled?"label"in e?"label"in e.parentNode?e.parentNode.disabled===t:e.disabled===t:e.isDisabled===t||e.isDisabled!==!t&&ae(e)===t:e.disabled===t:"label"in e&&e.disabled===t}}function ve(a){return le(function(o){return o=+o,le(function(e,t){var n,r=a([],e.length,o),i=r.length;while(i--)e[n=r[i]]&&(e[n]=!(t[n]=e[n]))})})}function ye(e){return e&&"undefined"!=typeof e.getElementsByTagName&&e}for(e in d=se.support={},i=se.isXML=function(e){var t=e&&e.namespaceURI,n=e&&(e.ownerDocument||e).documentElement;return!Y.test(t||n&&n.nodeName||"HTML")},T=se.setDocument=function(e){var t,n,r=e?e.ownerDocument||e:p;return r!=C&&9===r.nodeType&&r.documentElement&&(a=(C=r).documentElement,E=!i(C),p!=C&&(n=C.defaultView)&&n.top!==n&&(n.addEventListener?n.addEventListener("unload",oe,!1):n.attachEvent&&n.attachEvent("onunload",oe)),d.scope=ce(function(e){return a.appendChild(e).appendChild(C.createElement("div")),"undefined"!=typeof e.querySelectorAll&&!e.querySelectorAll(":scope fieldset div").length}),d.attributes=ce(function(e){return e.className="i",!e.getAttribute("className")}),d.getElementsByTagName=ce(function(e){return e.appendChild(C.createComment("")),!e.getElementsByTagName("*").length}),d.getElementsByClassName=K.test(C.getElementsByClassName),d.getById=ce(function(e){return a.appendChild(e).id=S,!C.getElementsByName||!C.getElementsByName(S).length}),d.getById?(b.filter.ID=function(e){var t=e.replace(te,ne);return function(e){return e.getAttribute("id")===t}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n=t.getElementById(e);return n?[n]:[]}}):(b.filter.ID=function(e){var n=e.replace(te,ne);return function(e){var t="undefined"!=typeof e.getAttributeNode&&e.getAttributeNode("id");return t&&t.value===n}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n,r,i,o=t.getElementById(e);if(o){if((n=o.getAttributeNode("id"))&&n.value===e)return[o];i=t.getElementsByName(e),r=0;while(o=i[r++])if((n=o.getAttributeNode("id"))&&n.value===e)return[o]}return[]}}),b.find.TAG=d.getElementsByTagName?function(e,t){return"undefined"!=typeof t.getElementsByTagName?t.getElementsByTagName(e):d.qsa?t.querySelectorAll(e):void 0}:function(e,t){var n,r=[],i=0,o=t.getElementsByTagName(e);if("*"===e){while(n=o[i++])1===n.nodeType&&r.push(n);return r}return o},b.find.CLASS=d.getElementsByClassName&&function(e,t){if("undefined"!=typeof t.getElementsByClassName&&E)return t.getElementsByClassName(e)},s=[],v=[],(d.qsa=K.test(C.querySelectorAll))&&(ce(function(e){var t;a.appendChild(e).innerHTML="",e.querySelectorAll("[msallowcapture^='']").length&&v.push("[*^$]="+M+"*(?:''|\"\")"),e.querySelectorAll("[selected]").length||v.push("\\["+M+"*(?:value|"+R+")"),e.querySelectorAll("[id~="+S+"-]").length||v.push("~="),(t=C.createElement("input")).setAttribute("name",""),e.appendChild(t),e.querySelectorAll("[name='']").length||v.push("\\["+M+"*name"+M+"*="+M+"*(?:''|\"\")"),e.querySelectorAll(":checked").length||v.push(":checked"),e.querySelectorAll("a#"+S+"+*").length||v.push(".#.+[+~]"),e.querySelectorAll("\\\f"),v.push("[\\r\\n\\f]")}),ce(function(e){e.innerHTML="";var t=C.createElement("input");t.setAttribute("type","hidden"),e.appendChild(t).setAttribute("name","D"),e.querySelectorAll("[name=d]").length&&v.push("name"+M+"*[*^$|!~]?="),2!==e.querySelectorAll(":enabled").length&&v.push(":enabled",":disabled"),a.appendChild(e).disabled=!0,2!==e.querySelectorAll(":disabled").length&&v.push(":enabled",":disabled"),e.querySelectorAll("*,:x"),v.push(",.*:")})),(d.matchesSelector=K.test(c=a.matches||a.webkitMatchesSelector||a.mozMatchesSelector||a.oMatchesSelector||a.msMatchesSelector))&&ce(function(e){d.disconnectedMatch=c.call(e,"*"),c.call(e,"[s!='']:x"),s.push("!=",F)}),v=v.length&&new RegExp(v.join("|")),s=s.length&&new RegExp(s.join("|")),t=K.test(a.compareDocumentPosition),y=t||K.test(a.contains)?function(e,t){var n=9===e.nodeType?e.documentElement:e,r=t&&t.parentNode;return e===r||!(!r||1!==r.nodeType||!(n.contains?n.contains(r):e.compareDocumentPosition&&16&e.compareDocumentPosition(r)))}:function(e,t){if(t)while(t=t.parentNode)if(t===e)return!0;return!1},j=t?function(e,t){if(e===t)return l=!0,0;var n=!e.compareDocumentPosition-!t.compareDocumentPosition;return n||(1&(n=(e.ownerDocument||e)==(t.ownerDocument||t)?e.compareDocumentPosition(t):1)||!d.sortDetached&&t.compareDocumentPosition(e)===n?e==C||e.ownerDocument==p&&y(p,e)?-1:t==C||t.ownerDocument==p&&y(p,t)?1:u?P(u,e)-P(u,t):0:4&n?-1:1)}:function(e,t){if(e===t)return l=!0,0;var n,r=0,i=e.parentNode,o=t.parentNode,a=[e],s=[t];if(!i||!o)return e==C?-1:t==C?1:i?-1:o?1:u?P(u,e)-P(u,t):0;if(i===o)return pe(e,t);n=e;while(n=n.parentNode)a.unshift(n);n=t;while(n=n.parentNode)s.unshift(n);while(a[r]===s[r])r++;return r?pe(a[r],s[r]):a[r]==p?-1:s[r]==p?1:0}),C},se.matches=function(e,t){return se(e,null,null,t)},se.matchesSelector=function(e,t){if(T(e),d.matchesSelector&&E&&!N[t+" "]&&(!s||!s.test(t))&&(!v||!v.test(t)))try{var n=c.call(e,t);if(n||d.disconnectedMatch||e.document&&11!==e.document.nodeType)return n}catch(e){N(t,!0)}return 0":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(e){return e[1]=e[1].replace(te,ne),e[3]=(e[3]||e[4]||e[5]||"").replace(te,ne),"~="===e[2]&&(e[3]=" "+e[3]+" "),e.slice(0,4)},CHILD:function(e){return e[1]=e[1].toLowerCase(),"nth"===e[1].slice(0,3)?(e[3]||se.error(e[0]),e[4]=+(e[4]?e[5]+(e[6]||1):2*("even"===e[3]||"odd"===e[3])),e[5]=+(e[7]+e[8]||"odd"===e[3])):e[3]&&se.error(e[0]),e},PSEUDO:function(e){var t,n=!e[6]&&e[2];return G.CHILD.test(e[0])?null:(e[3]?e[2]=e[4]||e[5]||"":n&&X.test(n)&&(t=h(n,!0))&&(t=n.indexOf(")",n.length-t)-n.length)&&(e[0]=e[0].slice(0,t),e[2]=n.slice(0,t)),e.slice(0,3))}},filter:{TAG:function(e){var t=e.replace(te,ne).toLowerCase();return"*"===e?function(){return!0}:function(e){return e.nodeName&&e.nodeName.toLowerCase()===t}},CLASS:function(e){var t=m[e+" "];return t||(t=new RegExp("(^|"+M+")"+e+"("+M+"|$)"))&&m(e,function(e){return t.test("string"==typeof e.className&&e.className||"undefined"!=typeof e.getAttribute&&e.getAttribute("class")||"")})},ATTR:function(n,r,i){return function(e){var t=se.attr(e,n);return null==t?"!="===r:!r||(t+="","="===r?t===i:"!="===r?t!==i:"^="===r?i&&0===t.indexOf(i):"*="===r?i&&-1:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i;function j(e,n,r){return m(n)?S.grep(e,function(e,t){return!!n.call(e,t,e)!==r}):n.nodeType?S.grep(e,function(e){return e===n!==r}):"string"!=typeof n?S.grep(e,function(e){return-1)[^>]*|#([\w-]+))$/;(S.fn.init=function(e,t,n){var r,i;if(!e)return this;if(n=n||D,"string"==typeof e){if(!(r="<"===e[0]&&">"===e[e.length-1]&&3<=e.length?[null,e,null]:q.exec(e))||!r[1]&&t)return!t||t.jquery?(t||n).find(e):this.constructor(t).find(e);if(r[1]){if(t=t instanceof S?t[0]:t,S.merge(this,S.parseHTML(r[1],t&&t.nodeType?t.ownerDocument||t:E,!0)),N.test(r[1])&&S.isPlainObject(t))for(r in t)m(this[r])?this[r](t[r]):this.attr(r,t[r]);return this}return(i=E.getElementById(r[2]))&&(this[0]=i,this.length=1),this}return e.nodeType?(this[0]=e,this.length=1,this):m(e)?void 0!==n.ready?n.ready(e):e(S):S.makeArray(e,this)}).prototype=S.fn,D=S(E);var L=/^(?:parents|prev(?:Until|All))/,H={children:!0,contents:!0,next:!0,prev:!0};function O(e,t){while((e=e[t])&&1!==e.nodeType);return e}S.fn.extend({has:function(e){var t=S(e,this),n=t.length;return this.filter(function(){for(var e=0;e\x20\t\r\n\f]*)/i,he=/^$|^module$|\/(?:java|ecma)script/i;ce=E.createDocumentFragment().appendChild(E.createElement("div")),(fe=E.createElement("input")).setAttribute("type","radio"),fe.setAttribute("checked","checked"),fe.setAttribute("name","t"),ce.appendChild(fe),y.checkClone=ce.cloneNode(!0).cloneNode(!0).lastChild.checked,ce.innerHTML="",y.noCloneChecked=!!ce.cloneNode(!0).lastChild.defaultValue,ce.innerHTML="",y.option=!!ce.lastChild;var ge={thead:[1,"","
"],col:[2,"","
"],tr:[2,"","
"],td:[3,"","
"],_default:[0,"",""]};function ve(e,t){var n;return n="undefined"!=typeof e.getElementsByTagName?e.getElementsByTagName(t||"*"):"undefined"!=typeof e.querySelectorAll?e.querySelectorAll(t||"*"):[],void 0===t||t&&A(e,t)?S.merge([e],n):n}function ye(e,t){for(var n=0,r=e.length;n",""]);var me=/<|&#?\w+;/;function xe(e,t,n,r,i){for(var o,a,s,u,l,c,f=t.createDocumentFragment(),p=[],d=0,h=e.length;d\s*$/g;function je(e,t){return A(e,"table")&&A(11!==t.nodeType?t:t.firstChild,"tr")&&S(e).children("tbody")[0]||e}function De(e){return e.type=(null!==e.getAttribute("type"))+"/"+e.type,e}function qe(e){return"true/"===(e.type||"").slice(0,5)?e.type=e.type.slice(5):e.removeAttribute("type"),e}function Le(e,t){var n,r,i,o,a,s;if(1===t.nodeType){if(Y.hasData(e)&&(s=Y.get(e).events))for(i in Y.remove(t,"handle events"),s)for(n=0,r=s[i].length;n").attr(n.scriptAttrs||{}).prop({charset:n.scriptCharset,src:n.url}).on("load error",i=function(e){r.remove(),i=null,e&&t("error"===e.type?404:200,e.type)}),E.head.appendChild(r[0])},abort:function(){i&&i()}}});var _t,zt=[],Ut=/(=)\?(?=&|$)|\?\?/;S.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var e=zt.pop()||S.expando+"_"+wt.guid++;return this[e]=!0,e}}),S.ajaxPrefilter("json jsonp",function(e,t,n){var r,i,o,a=!1!==e.jsonp&&(Ut.test(e.url)?"url":"string"==typeof e.data&&0===(e.contentType||"").indexOf("application/x-www-form-urlencoded")&&Ut.test(e.data)&&"data");if(a||"jsonp"===e.dataTypes[0])return r=e.jsonpCallback=m(e.jsonpCallback)?e.jsonpCallback():e.jsonpCallback,a?e[a]=e[a].replace(Ut,"$1"+r):!1!==e.jsonp&&(e.url+=(Tt.test(e.url)?"&":"?")+e.jsonp+"="+r),e.converters["script json"]=function(){return o||S.error(r+" was not called"),o[0]},e.dataTypes[0]="json",i=C[r],C[r]=function(){o=arguments},n.always(function(){void 0===i?S(C).removeProp(r):C[r]=i,e[r]&&(e.jsonpCallback=t.jsonpCallback,zt.push(r)),o&&m(i)&&i(o[0]),o=i=void 0}),"script"}),y.createHTMLDocument=((_t=E.implementation.createHTMLDocument("").body).innerHTML="
",2===_t.childNodes.length),S.parseHTML=function(e,t,n){return"string"!=typeof e?[]:("boolean"==typeof t&&(n=t,t=!1),t||(y.createHTMLDocument?((r=(t=E.implementation.createHTMLDocument("")).createElement("base")).href=E.location.href,t.head.appendChild(r)):t=E),o=!n&&[],(i=N.exec(e))?[t.createElement(i[1])]:(i=xe([e],t,o),o&&o.length&&S(o).remove(),S.merge([],i.childNodes)));var r,i,o},S.fn.load=function(e,t,n){var r,i,o,a=this,s=e.indexOf(" ");return-1").append(S.parseHTML(e)).find(r):e)}).always(n&&function(e,t){a.each(function(){n.apply(this,o||[e.responseText,t,e])})}),this},S.expr.pseudos.animated=function(t){return S.grep(S.timers,function(e){return t===e.elem}).length},S.offset={setOffset:function(e,t,n){var r,i,o,a,s,u,l=S.css(e,"position"),c=S(e),f={};"static"===l&&(e.style.position="relative"),s=c.offset(),o=S.css(e,"top"),u=S.css(e,"left"),("absolute"===l||"fixed"===l)&&-1<(o+u).indexOf("auto")?(a=(r=c.position()).top,i=r.left):(a=parseFloat(o)||0,i=parseFloat(u)||0),m(t)&&(t=t.call(e,n,S.extend({},s))),null!=t.top&&(f.top=t.top-s.top+a),null!=t.left&&(f.left=t.left-s.left+i),"using"in t?t.using.call(e,f):c.css(f)}},S.fn.extend({offset:function(t){if(arguments.length)return void 0===t?this:this.each(function(e){S.offset.setOffset(this,t,e)});var e,n,r=this[0];return r?r.getClientRects().length?(e=r.getBoundingClientRect(),n=r.ownerDocument.defaultView,{top:e.top+n.pageYOffset,left:e.left+n.pageXOffset}):{top:0,left:0}:void 0},position:function(){if(this[0]){var e,t,n,r=this[0],i={top:0,left:0};if("fixed"===S.css(r,"position"))t=r.getBoundingClientRect();else{t=this.offset(),n=r.ownerDocument,e=r.offsetParent||n.documentElement;while(e&&(e===n.body||e===n.documentElement)&&"static"===S.css(e,"position"))e=e.parentNode;e&&e!==r&&1===e.nodeType&&((i=S(e).offset()).top+=S.css(e,"borderTopWidth",!0),i.left+=S.css(e,"borderLeftWidth",!0))}return{top:t.top-i.top-S.css(r,"marginTop",!0),left:t.left-i.left-S.css(r,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var e=this.offsetParent;while(e&&"static"===S.css(e,"position"))e=e.offsetParent;return e||re})}}),S.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(t,i){var o="pageYOffset"===i;S.fn[t]=function(e){return $(this,function(e,t,n){var r;if(x(e)?r=e:9===e.nodeType&&(r=e.defaultView),void 0===n)return r?r[i]:e[t];r?r.scrollTo(o?r.pageXOffset:n,o?n:r.pageYOffset):e[t]=n},t,e,arguments.length)}}),S.each(["top","left"],function(e,n){S.cssHooks[n]=Fe(y.pixelPosition,function(e,t){if(t)return t=We(e,n),Pe.test(t)?S(e).position()[n]+"px":t})}),S.each({Height:"height",Width:"width"},function(a,s){S.each({padding:"inner"+a,content:s,"":"outer"+a},function(r,o){S.fn[o]=function(e,t){var n=arguments.length&&(r||"boolean"!=typeof e),i=r||(!0===e||!0===t?"margin":"border");return $(this,function(e,t,n){var r;return x(e)?0===o.indexOf("outer")?e["inner"+a]:e.document.documentElement["client"+a]:9===e.nodeType?(r=e.documentElement,Math.max(e.body["scroll"+a],r["scroll"+a],e.body["offset"+a],r["offset"+a],r["client"+a])):void 0===n?S.css(e,t,i):S.style(e,t,n,i)},s,n?e:void 0,n)}})}),S.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(e,t){S.fn[t]=function(e){return this.on(t,e)}}),S.fn.extend({bind:function(e,t,n){return this.on(e,null,t,n)},unbind:function(e,t){return this.off(e,null,t)},delegate:function(e,t,n,r){return this.on(t,e,n,r)},undelegate:function(e,t,n){return 1===arguments.length?this.off(e,"**"):this.off(t,e||"**",n)},hover:function(e,t){return this.mouseenter(e).mouseleave(t||e)}}),S.each("blur focus focusin focusout resize scroll click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup contextmenu".split(" "),function(e,n){S.fn[n]=function(e,t){return 0 - +h1.title {font-size: 38px;} +h2 {font-size: 30px;} +h3 {font-size: 24px;} +h4 {font-size: 18px;} +h5 {font-size: 16px;} +h6 {font-size: 12px;} +code {color: inherit; background-color: rgba(0, 0, 0, 0.04);} +pre:not([class]) { background-color: white } +code{white-space: pre-wrap;} +span.smallcaps{font-variant: small-caps;} +span.underline{text-decoration: underline;} +div.column{display: inline-block; vertical-align: top; width: 50%;} +div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} +ul.task-list{list-style: none;} + +code{white-space: pre-wrap;} +span.smallcaps{font-variant: small-caps;} +span.underline{text-decoration: underline;} +div.column{display: inline-block; vertical-align: top; width: 50%;} +div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} +ul.task-list{list-style: none;} + - - + @@ -3296,23 +3110,23 @@ //# sourceMappingURL=crosstalk.min.js.map +code{white-space: pre-wrap;} +span.smallcaps{font-variant: small-caps;} +span.underline{text-decoration: underline;} +div.column{display: inline-block; vertical-align: top; width: 50%;} +div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} +ul.task-list{list-style: none;} +
@@ -42428,7 +42235,7 @@

11.3.3 Anomalous Test -
hvt.prediction[["predictPlot"]]
+
hvt.prediction[["predictPlot"]]

@@ -42439,10 +42246,10 @@

12 Download

The predictions from the above sections can be downloaded in section below. The downloaded predictions can be found in the output(LOCAL) folder.

-
predictClusterData <- hvt.prediction[["scoredPredictedData"]]%>%as.data.frame()
- predictClusterData  %>% head(100) %>% round(2)%>%
-      as.data.frame() %>%
-      Table(scroll = T, limit = 20)
+
predictClusterData <- hvt.prediction[["scoredPredictedData"]]%>%as.data.frame()
+ predictClusterData  %>% head(100) %>% round(2)%>%
+      as.data.frame() %>%
+      Table(scroll = T, limit = 20)
diff --git a/vignettes/muHVT_vignette.Rmd b/vignettes/HVT_vignette.Rmd similarity index 98% rename from vignettes/muHVT_vignette.Rmd rename to vignettes/HVT_vignette.Rmd index 5563fcd..5ecc4ef 100644 --- a/vignettes/muHVT_vignette.Rmd +++ b/vignettes/HVT_vignette.Rmd @@ -1,5 +1,5 @@ --- -title: "muHVT: An Introduction" +title: "HVT: An Introduction" author: "Zubin Dowlaty, Shubhra Prakash, Sangeet Moy Das, Praditi Shah, Shantanu Vaidya, Somya Shambhawi" date: "`r Sys.Date()`" fig.height: 4 @@ -70,8 +70,8 @@ if (length(new.packages)) # Loading the required libraries lapply(list.of.packages, library, character.only = T) -# Sourcing the modified files for muHVT -## Do this if muHVT is unavailable on CRAN +# Sourcing the modified files for HVT +## Do this if HVT is unavailable on CRAN source("../R/Add_boundary_points.R") source("../R/Corrected_Tessellations.R") @@ -155,7 +155,7 @@ set.seed(240) # Abstract -The muHVT package is a collection of R functions to facilitate building [topology preserving maps](https://users.ics.aalto.fi/jhollmen/dippa/node9.html) for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below: +The HVT package is a collection of R functions to facilitate building [topology preserving maps](https://users.ics.aalto.fi/jhollmen/dippa/node9.html) for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below: 1. **Data Compression**: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective. @@ -315,7 +315,7 @@ The prediction algorithm recursively calculates the distance between each point 3. Check if the cell drills down further to form more cells. 4. If it doesn’t, return the path. Or else repeat steps 1 to 4 till we reach a level at which the cell doesn’t drill down further. -# Example I: muHVT with the Torus dataset +# Example I: HVT with the Torus dataset **In this section, we will see how we can use the package to visualize multidimensional data by projecting them to two dimensions using Sammon's projection and further used for scoring** @@ -961,7 +961,7 @@ hist(Act_pred_Table$diff, breaks = 20, col = "blue", main = "Mean Absolute Diffe ``` -# Example II: muHVT with the Personal Computer dataset +# Example II: HVT with the Personal Computer dataset **Data Understanding** @@ -1057,7 +1057,7 @@ As we are familiar with the structure of the computers data, we will now follow ## Step 1: Data Compression -For more detailed information on Data Compression please refer to [section 2](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/muHVT_vignette.html#data-compression) of this vignette. +For more detailed information on Data Compression please refer to [section 2](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/HVT_vignette.html#data-compression) of this vignette. We will use the `HVT` function to compress our data while preserving essential features of the dataset. Our goal is to achieve data compression upto atleast `80%`. In situations where the compression ratio does not meet the desired target, we can explore adjusting the model parameters as a potential solution. This involves making modifications to parameters such as the `quantization error threshold` or `increasing the number of cells` and then rerunning the HVT function again. @@ -1129,7 +1129,7 @@ All the columns after this will contain centroids for each cell. They can also b ## Step 2: Data Projection -For more detailed information on Data Projection please refer to [section 3](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/muHVT_vignette.html#data-projection) of this vignette. +For more detailed information on Data Projection please refer to [section 3](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/HVT_vignette.html#data-projection) of this vignette. lets view the projected 2D centroids after performing sammon's projection on the compressed data (440 cells) recieved after performing vector quantization. For the sake of brevity we are displaying first six rows. @@ -1164,7 +1164,7 @@ ggplot(centroid_coordinates, aes(x_coord, y_coord)) + ## Step 3: Tessellation -For more detailed information on voronoi tessellation please refer to [section 4](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/muHVT_vignette.html#tessellation) of this vignette. +For more detailed information on voronoi tessellation please refer to [section 4](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/HVT_vignette.html#tessellation) of this vignette. Now, we have obtained the centroid coordinates resulting from the application of Sammon's projection. @@ -1295,7 +1295,7 @@ muHVT::hvtHmap( ## Step 4: Prediction(predictHVT) -For more detailed information on prediction please refer to [section 5](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/muHVT_vignette.html#prediction) of this vignette. +For more detailed information on prediction please refer to [section 5](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/HVT_vignette.html#prediction) of this vignette. **Raw Testing Dataset** @@ -1384,7 +1384,7 @@ hist(Act_pred_Table$diff, breaks = 20, col = "blue", main = "Mean Absolute Diffe # Executive Summary -* **Example I: muHVT with the Torus dataset** +* **Example I: HVT with the Torus dataset** * We have considered torus dataset for multidimensional data visualization using sammons projection. @@ -1398,7 +1398,7 @@ hist(Act_pred_Table$diff, breaks = 20, col = "blue", main = "Mean Absolute Diffe * Once again, we generated a compressed HVT map (hvt.torus3) using the HVT() algorithm on the torus dataset. The parameters for this map were set to `n_cells = 900`, `quant.error = 0.1`, and `depth = 1`. Upon analyzing the compression summary, we found that 85% of the 100 cells have reached the quantization threshold error and we can clearly visualize the 3D torus(donut) in 2D space. -* **Example II: muHVT with the Personal Computer dataset** +* **Example II: HVT with the Personal Computer dataset** * We have considered computers dataset for generating predictions to see which cell and level each point belongs to. diff --git a/vignettes/muHVT_vignette.html b/vignettes/HVT_vignette.html similarity index 98% rename from vignettes/muHVT_vignette.html rename to vignettes/HVT_vignette.html index 92c925a..7ae65c2 100644 --- a/vignettes/muHVT_vignette.html +++ b/vignettes/HVT_vignette.html @@ -12,9 +12,9 @@ - + -muHVT: An Introduction +HVT: An Introduction - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

muHVT : Using mlayerHVT() for Monitoring -Entities over Time

-

Zubin Dowlaty, Shantanu Vaidya

-

2023-03-03

- - - - -
-

1 Abstract

-

The muHVT package is a collection of R functions to facilitate -building topology -preserving maps for rich multivariate data. Tending towards a big -data preponderance, a large number of rows. A collection of R functions -for this typical workflow is organized below :

-
    -
  1. Data Compression: Vector quantization (VQ), HVQ -(hierarchical vector quantization) using means or medians. This step -compresses the rows (long data frame) using a compression -objective

  2. -
  3. Data Projection: Dimension projection of the -compressed cells to 1D,2D and 3D with the Sammons Non-linear Algorithm. -This step creates topology preserving map coordinates into the desired -output dimension

  4. -
  5. Tessellation: Create cells required for object -visualization using the Voronoi Tessellation method, package includes -heatmap plots for hierarchical Voronoi tessellations (HVT). This step -enables data insights, visualization, and interaction with the topology -preserving map. Useful for semi-supervised tasks

  6. -
  7. Prediction: Scoring new data sets and recording -their assignment using the map objects from the above steps, in a -sequence of maps if required

  8. -
-

This package now additionally provides functionality to predict based -on a set of maps to monitor entities over time.

-

The creation of a predictive set involves four steps -

-
    -
  1. Compress: Compress the dataset using a percentage -compression rate and a quantization threshold using the HVT() function -(Map A)
  2. -
  3. Remove outlier cells: Manually identify and remove -the outlier cells from the dataset using the removeOutliers() function -(Map B)
  4. -
  5. Compress the dataset without outliers: Again -compress the dataset without outlier(s) using n_cells, depth and a -quantization threshold using the HVT() function (Map C)
  6. -
  7. Predict based on a predictive set of maps: Using -the mlayerHVT() function
  8. -
-

Let us try to understand the steps with the help of the diagram -below.

-
-Figure 1: Flow diagram for predicting based on a set of maps using mlayerHVT() -

-Figure 1: Flow diagram for predicting based on a set of maps using -mlayerHVT() -

-
-

Initially, the raw data is passed, and a highly compressed Map A is -constructed using the HVT function. The -output of this function will be hierarchically arranged vector quantized -data that is used to identify the outlier cells in the dataset using the -number of data points within each cell and the z-scores for each -cell.

-

The identified outlier cell(s) is then passed to the -removeOutliers function along with Map A. -This function removes the identified outlier cell(s) from the dataset -and stores them in Map B as shown in the diagram. The final output of -this function is a list of two items - a newly constructed map (Map B), -and a subset of the dataset without outlier cell(s).

-

The plotCells function plots the -Voronoi tessellations for the compressed map (Map A) and highlights the -identified outlier cell(s) in red on the plot. The function requires the -identified outlier cell(s) number and the compressed map (Map A) as -input in order to plot the tessellations map and highlight those outlier -cells on it.

-

The dataset without outlier(s) gotten as an output from the -removeOutliers function is then passed as an argument to the -HVT function with other parameters such as -n_cells, quant.error, depth, etc. to construct another map (Map C).

-

Finally, all the constructed maps are passed to the -mlayerHVT function along with the test -dataset on which the function will predict/score for finding which map -and what cell each test record gets assigned to.

- - - - - -
-
-

2 Data Understanding

-

In this notebook, we will use the -Prices of Personal Computers dataset. This -dataset contains 6261 observations and 6 features. The dataset observes -the price from 1993 to 1995 of 486 personal computers in the US. The -variables are price, speed, hd, ram, screen and ads.

-

Here, we load the data and store into a variables.

-
set.seed(240)
-# Load data from csv files
-trainComputers <- read.csv("https://raw.githubusercontent.com/Mu-Sigma/muHVT/dev/vignettes/sample_dataset/trainComputers.csv")
-testComputers <- read.csv("https://raw.githubusercontent.com/Mu-Sigma/muHVT/dev/vignettes/sample_dataset/testComputers.csv")
-

Let’s have a look at some of the training data

-
# Quick peek
-Table(head(trainComputers))
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-price - -speed - -hd - -ram - -screen - -ads -
-1499 - -25 - -80 - -4 - -14 - -94 -
-1795 - -33 - -85 - -2 - -14 - -94 -
-1595 - -25 - -170 - -4 - -15 - -94 -
-1849 - -25 - -170 - -8 - -14 - -94 -
-3295 - -33 - -340 - -16 - -14 - -94 -
-3695 - -66 - -340 - -16 - -14 - -94 -
-
-

Now let us check the structure of the training data

-
str(trainComputers)
-#> 'data.frame':    5008 obs. of  6 variables:
-#>  $ price : int  1499 1795 1595 1849 3295 3695 1720 1995 2225 2575 ...
-#>  $ speed : int  25 33 25 25 33 66 25 50 50 50 ...
-#>  $ hd    : int  80 85 170 170 340 340 170 85 210 210 ...
-#>  $ ram   : int  4 2 4 8 16 16 4 2 8 4 ...
-#>  $ screen: int  14 14 15 14 14 14 14 14 14 15 ...
-#>  $ ads   : int  94 94 94 94 94 94 94 94 94 94 ...
-

Let’s get a summary of the training data

-
summary(trainComputers)
-#>      price          speed              hd              ram        
-#>  Min.   : 949   Min.   : 25.00   Min.   :  80.0   Min.   : 2.000  
-#>  1st Qu.:1824   1st Qu.: 33.00   1st Qu.: 212.0   1st Qu.: 4.000  
-#>  Median :2195   Median : 50.00   Median : 340.0   Median : 8.000  
-#>  Mean   :2271   Mean   : 48.21   Mean   : 356.1   Mean   : 7.677  
-#>  3rd Qu.:2644   3rd Qu.: 66.00   3rd Qu.: 450.0   3rd Qu.: 8.000  
-#>  Max.   :5399   Max.   :100.00   Max.   :2100.0   Max.   :32.000  
-#>      screen           ads       
-#>  Min.   :14.00   Min.   : 94.0  
-#>  1st Qu.:14.00   1st Qu.:216.0  
-#>  Median :14.00   Median :259.0  
-#>  Mean   :14.53   Mean   :243.7  
-#>  3rd Qu.:15.00   3rd Qu.:283.0  
-#>  Max.   :17.00   Max.   :339.0
- -
-

3 Map A : Compress using -vector quantization

-

This package can perform vector quantization using the following -algorithms -

-
    -
  • Hierarchical Vector Quantization using k−means
  • -
  • Hierarchical Vector Quantization using k−medoids
  • -
-

For more information on vector quantization, refer the following link.

-

The HVT function constructs highly compressed hierarchical Voronoi -tessellations. The raw data is first scaled and this scaled data is -supplied as input to the vector quantization algorithm. The vector -quantization algorithm compresses the dataset until a user-defined -compression percentage/rate is achieved using a parameter called -quantization error which acts as a threshold and determines the -compression percentage. It means that for a given user-defined -compression percentage we get the ‘n’ number of cells, then all of these -cells formed will have a quantization error less than the threshold -quantization error.

-

Let us try to understand the HVT function -

-
HVT(
-  dataset,
-  min_compression_perc,
-  n_cells,
-  depth,
-  quant.err,
-  distance_metric = c("L1_Norm", "L2_Norm"),
-  error_metric = c("mean", "max"),
-  quant_method = c("kmeans", "kmedoids"),
-  normalize = TRUE,
-  diagnose = FALSE,
-  hvt_validation = FALSE,
-  train_validation_split_ratio = 0.8
-)
-

Each of the parameters have been explained below :

-
    -
  • dataset - A dataframe with numeric -columns

  • -
  • min_compression_perc - An integer -indicating the minimum percent compression rate to be achieved for the -dataset

  • -
  • n_cells - An integer indicating the -number of cells per hierarchy (level)

  • -
  • depth - An integer indicating the -number of levels. (1 = No hierarchy, 2 = 2 levels, etc …)

  • -
  • quant.error - A number indicating -the quantization error threshold. A cell will only breakdown into -further cells if the quantization error of the cell is above the defined -quantization error threshold

  • -
  • distance_metric - The distance -metric can be L1_Norm or L2_Norm. -L1_Norm is selected by default. The distance metric is used -to calculate the distance between an n dimensional point -and centroid. The user can also pass a custom function to calculate this -distance

  • -
  • error_metric - The error metric can -be mean or max. max is selected -by default. max will return the max of m -values and mean will take mean of m values -where each value is a distance between a point and centroid of the cell. -Moreover, the user can also pass a custom function to calculate the -error metric

  • -
  • quant_method - The quantization -method can be kmeans or kmedoids. -kmeans is selected by default

  • -
  • normalize - A logical value -indicating whether the columns in your dataset need to be normalized. -Default value is TRUE. The algorithm supports Z-score -normalization

  • -
  • diagnose - A logical value -indicating whether user wants to perform diagnostics on the model. -Default value is TRUE.

  • -
  • hvt_validation - A logical value -indicating whether user wants to holdout a validation set and find mean -absolute deviation of the validation points from the centroid. Default -value is FALSE.

  • -
  • train_validation_split_ratio - A -numeric value indicating train validation split ratio. This argument is -only used when hvt_validation has been set to TRUE. Default value for -the argument is 0.8

  • -
-

First, we will construct Map A by performing Hierarchical Vector -Quantization by setting the parameter min_compression_perc to 70% and -quantization error threshold to 0.2. The compressed map is always -constructed at depth = 1.

-
hvt_mapA <- list()
-
-hvt_mapA <-muHVT::HVT(trainComputers,
-                min_compression_perc = 70,
-                quant.err = 0.2,
-                distance_metric = "L1_Norm",
-                error_metric = "max",
-                quant_method = "kmeans",
-                normalize = TRUE
-)
-

[1] “For the given dataset 70% compression rate is achieved at -n_cells : 318”

-

Now, let’s check the compression summary for HVT Map A. The table -below shows no of cells, no of cells having quantization error below -threshold and percentage of cells having quantization error below -threshold for each level.

-
mapA_compression_summary <- hvt_mapA[[3]]$compression_summary %>%  dplyr::mutate_if(is.numeric, funs(round(.,4)))
-DT::datatable(mapA_compression_summary, class = 'cell-border stripe', options = list(scrollX = TRUE))
-
- -

As per the manual, hvt_mapA[[3]] gives -us detailed information about the hierarchical vector quantized data. -hvt_mapA[[3]][['summary']] gives a nice -tabular data containing no of points, Quantization Error and the -codebook.

-

Now let us understand what each column in the above table means:

-
    -
  • Segment.Level - Level of the cell. -In this case, we have performed Vector Quantization for depth 1. Hence -Segment Level is 1

  • -
  • Segment.Parent - Parent segment of -the cell

  • -
  • Segment.Child (Cell.Number) - The -children of a particular cell. In this case, it is the total number of -cells at which we achieved the defined compression percentage

  • -
  • n - No of points in each -cell

  • -
  • Cell.ID - Cell_ID’s are generated -for the multivariate data using 1-D Sammon’s Projection -algorithm

  • -
  • Quant.Error - Quantization Error -for each cell

  • -
-

All the columns after this will contain centroids for each cell. They -can also be called a codebook, which represents a collection of all -centroids or codewords.

-
cell_mapping_df <- arrange(hvt_mapA[[3]]$summary, n) %>% mutate_if(is.numeric, round, digits = 4)
-DT::datatable(cell_mapping_df, filter = 'top', options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')
-
- -

Let’s have look at the function hvtHmap which we will -use to overlay a variable as heatmap.

-
hvtHmap(hvt.results, dataset, child.level, hmap.cols, color.vec ,line.width, palette.color = 6)
-
    -
  • hvt.results - A list of hvt.results -obtained from the HVT function

  • -
  • dataset - A dataframe containing -the variables to overlay as a heatmap. The user can pass an external -dataset or the dataset that was used to perform hierarchical vector -quantization. The dataset should have the same number of points as the -dataset used to perform hierarchical Vector Quantization in the HVT -function

  • -
  • child.level - A number indicating -the level for which the heat map is to be plotted

  • -
  • hmap.cols - The column number of -column name from the dataset indicating the variables for which the heat -map is to be plotted. To plot the quantization error as heatmap, pass -'quant_error'. Similarly to plot the no of points in each -cell as heatmap, pass 'no_of_points' as a -parameter

  • -
  • color.vec - A color vector such -that length(color.vec) = child.level (default = NULL)

  • -
  • line.width - A line width vector -such that length(line.width) = child.level (default = NULL)

  • -
  • palette.color - A number indicating -the heat map color palette. 1 - rainbow, 2 - heat.colors, 3 - -terrain.colors, 4 - topo.colors, 5 - cm.colors, 6 - BlCyGrYlRd -(Blue,Cyan,Green,Yellow,Red) color (default = 6)

  • -
  • show.points - A boolean indicating -whether the centroids should be plotted on the tessellations (default = -FALSE)

  • -
-

Now let’s plot all the features for each cell at level one as a -heatmap.

-
hmap <- list()
-col_list <- colnames(trainComputers)
-hmap <- lapply(1:length(col_list), function(x){
-  hvtHmap(
-  hvt_mapA,
-  scores,
-  child.level = 1,
-  hmap.cols = col_list[[x]],
-  line.width = c(0.2),
-  color.vec = c("#141B41"),
-  palette.color = 6,
-  centroid.size = 1.0,
-  show.points = T,
-  quant.error.hmap = 0.2,
-  n_cells.hmap = 15,
-) #%>% ggplotly() # for plotly
-})
-
grid.arrange(hmap[[1]], nrow = 1, ncol=1)
-

-
grid.arrange(hmap[[2]], nrow = 1, ncol=1)
-

-
grid.arrange(hmap[[3]], nrow = 1, ncol=1)
-

-
grid.arrange(hmap[[4]], nrow = 1, ncol=1)
-

-
grid.arrange(hmap[[5]], nrow = 1, ncol=1)
-

-
grid.arrange(hmap[[6]], nrow = 1, ncol=1)
-

-
-
-

4 Map B : Identify and -remove the outlier cells

-

The removeOutliers function removes the identified outlier cell(s) -from the dataset and stores them in Map B.

-

It takes input as the cell number (Segment.Child) of the identified -outlier cell(s) from the above table and the compressed HVT map (Map A). -It returns a list of three items: HVT Map A, removed outlier rows, and a -subset of the dataset without the outliers.

-
identified_outlier_cells <- c(10, 53, 198)
-output_list <- removeOutliers(identified_outlier_cells, hvt_mapA)
-

[1] “The following cell(s) have been removed as outliers from the -dataset: 10 53 198”

-
hvt_mapB <- output_list[[1]]
-dataset_without_outliers <- output_list[[2]]
-

Note - In the HVT Map B, the total number of cells -would be equal to the total number of outlier cell(s) removed from the -HVT Map A. Let’s say, as in the above case where three cells (10th, -53rd, and 198th) are identified as the outlier are removed. Then the -10th cell would be the first cell in the HVT Map B, 53rd as the second -cell, and 198th as the third cell.

-

Let’s have a look at the removed outlier rows data

-
- -
-

4.1 Voronoi tessellation -to highlight outlier cell in the map

-

The plotCells function is used to plot the Voronoi tessellation using -the compressed HVT map (Map A) and highlights the identified outlier -cell(s) in red on the map.

-
plotCells(identified_outlier_cells, hvt_mapA)
-
-Figure 3: The Voronoi Tessellation constructed using the compressed HVT map (Map A) with the outlier cell(s) highlighted in red -

-Figure 3: The Voronoi Tessellation constructed using the compressed HVT -map (Map A) with the outlier cell(s) highlighted in red -

-
-
-
-
-

5 Map C : Construct a map -on the dataset without outlier(s)

-

Construct another hierarchical Voronoi tessellation (Map C) on the -dataset without outlier(s) using the HVT function.

-

Pass the following as inputs to the HVT function - the ‘n_cells’ -parameter which determines the number of cells at each depth, ‘depth’ -indicates the number of levels, and ‘quant.err’ acts as a threshold and -determines the number of levels in the hierarchy.

-
set.seed(240)
-hvt_mapC <- list()
-
-mapA_scale_summary = hvt_mapA[[3]]$scale_summary
-hvt_mapC <- HVT(dataset_without_outliers,
-                    n_cells = 15,
-                    depth = 2,
-                    quant.err = 0.2,
-                    distance_metric = "L1_Norm",
-                    error_metric = "max",
-                    quant_method = "kmeans",
-                    projection.scale = 10,
-                    normalize = FALSE,
-                    scale_summary = mapA_scale_summary)
-

Now let’s check the compression summary for HVT Map C. The table -below shows no of cells, no of cells having quantization error below -threshold and percentage of cells having quantization error below -threshold for each level.

-
mapC_compression_summary <- hvt_mapC[[3]]$compression_summary %>%  dplyr::mutate_if(is.numeric, funs(round(.,4)))
-DT::datatable(mapC_compression_summary, class = 'cell-border stripe', options = list(scrollX = TRUE))
-
- -

hvt_mapC[[3]] gives us detailed -information about the hierarchical vector quantized data. -hvt_mapC[[3]][['summary']] gives a nice -tabular data containing no of points, Quantization Error and the -codebook.

-
- -

Now let’s plot the price feature for -each cell at level one as a heatmap.

-
hvtHmap(
-  hvt_mapC,
-  trainComputers,
-  child.level = 2,
-  hmap.cols = "price",
-  line.width = c(0.6, 0.4),
-  color.vec = c("#141B41", "#0582CA"),
-  palette.color = 6,
-  centroid.size = 1.0,
-  show.points = T,
-  quant.error.hmap = 0.2,
-  n_cells.hmap = 15
-)
-
-Figure 4: The Voronoi Tessellation with the heat map overlaid for variable ’price’ in the ’computers’ dataset -

-Figure 4: The Voronoi Tessellation with the heat map overlaid for -variable ’price’ in the ’computers’ dataset -

-
-
-
-

6 Prediction on Test -Data

-

The following mlayerHVT function is used to score the test data using -the predictive set of maps. This function takes an input - a test data -and a set of maps which is used to predict which map and what cell each -test record is assigned to.

-

Let’s have a look at some of the test data

-
# Quick peek
-Table(head(testComputers))
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-price - -speed - -hd - -ram - -screen - -ads -
-1540 - -33 - -214 - -4 - -15 - -191 -
-3094 - -50 - -1000 - -24 - -15 - -191 -
-1794 - -50 - -214 - -4 - -14 - -191 -
-2408 - -100 - -270 - -4 - -14 - -191 -
-2454 - -66 - -720 - -16 - -15 - -191 -
-1969 - -66 - -1000 - -8 - -14 - -191 -
-
-

Now let us check the structure of the test data

-
str(testComputers)
-#> 'data.frame':    1253 obs. of  6 variables:
-#>  $ price : int  1540 3094 1794 2408 2454 1969 2904 1545 1718 1604 ...
-#>  $ speed : int  33 50 50 100 66 66 50 66 66 33 ...
-#>  $ hd    : int  214 1000 214 270 720 1000 1000 340 340 214 ...
-#>  $ ram   : int  4 24 4 4 16 8 24 8 4 4 ...
-#>  $ screen: int  15 15 14 14 15 14 15 14 14 14 ...
-#>  $ ads   : int  191 191 191 191 191 191 191 191 191 191 ...
-

Let’s get a summary of the test data

-
summary(testComputers)
-#>      price          speed              hd              ram       
-#>  Min.   : 949   Min.   : 33.00   Min.   : 125.0   Min.   : 2.00  
-#>  1st Qu.:1654   1st Qu.: 50.00   1st Qu.: 428.0   1st Qu.: 8.00  
-#>  Median :1904   Median : 66.00   Median : 545.0   Median : 8.00  
-#>  Mean   :2017   Mean   : 67.16   Mean   : 658.2   Mean   :10.76  
-#>  3rd Qu.:2344   3rd Qu.: 75.00   3rd Qu.: 850.0   3rd Qu.:16.00  
-#>  Max.   :3994   Max.   :100.00   Max.   :2100.0   Max.   :32.00  
-#>      screen           ads     
-#>  Min.   :14.00   Min.   : 39  
-#>  1st Qu.:14.00   1st Qu.:129  
-#>  Median :15.00   Median :152  
-#>  Mean   :14.93   Mean   :132  
-#>  3rd Qu.:15.00   3rd Qu.:163  
-#>  Max.   :17.00   Max.   :292
-

For validating the predictions, the row numbers : 493, 550, 753, and -1253 were manually added in the test dataset to see if those are getting -mapped to the correct outlier cell in the HVT Map B as per below table -(ground truth).

-

Now, Let us understand the mlayerHVT function -

-
predictions <- mlayerHVT(testComputers,
-                          hvt_mapA,
-                          hvt_mapB,
-                          hvt_mapC,
-                          ...)
-

The parameters for the function mlayerHVT are as -below

-
    -
  • data - A dataframe containing the -test dataset. The dataframe should have atleast one variable used for -training. The variables from this dataset can also be used to overlay as -heatmap

  • -
  • hvt_mapA - A list of hvt.results -obtained from the HVT function : Map A

  • -
  • hvt_mapB - A list of outlier(s) -cells obtained from the removeOutliers function : Map B

  • -
  • hvt_mapC - A list of hvt.results -obtained from the HVT function : Map C

  • -
  • ... - distance_metric and -error_metric can be passed from here

  • -
-

The function predicts based on the HVT maps - Map A, Map B and Map C, -constructed using HVT function. For each test record, the function will -assign that record to Map A and and either of the other two - Map B or -Map C. It can never get assigned to all of the three maps. For example, -if there exists any outlier record in the test dataset, then that record -will get assigned to Map A and Map B. It won’t get assigned to Map C -since that is constructed on the dataset without outliers.

-

For more information on prediction algorithm, refer the link.

-

Note : The prediction algorithm will not work if some of the -variables used to perform quantization are missing. In the test dataset, -we should not remove any features.

-
set.seed(240)
-predictions <- list()
-
-predictions <- mlayerHVT(testComputers,
-                          hvt_mapA,
-                          hvt_mapB,
-                          hvt_mapC
-                          )
-

Below are the predicted results on the test data based on the -predictive set of maps : Map A, Map B and Map C -

-
- -

Hence, from the table above, we can clearly see that the predictions -are exactly matching with the ground truth.

-
    -
  • The 493rd and 753rd test record is predicted as outlier record -and is mapped to the 3rd outlier cell of the HVT Map B.

  • -
  • Similarly, the 550th and 1253rd test record is correctly -predicted as the outlier record and gets mapped to the 1st and 2nd cell -of the HVT Map B respectively.

  • -
  • Also, 889th record is identified as a outlier and gets mapped to -the 3rd cell of the HVT Map B.

  • -
  • The remaining non-outlier records get mapped to the HVT Map -C.

  • -
-
-
-

7 Executive Summary

-
    -
  • We have considered computers dataset for creating a predictive -set of maps to monitor entities over time using mlayerHVT() in this -notebook

  • -
  • We construct a compressed HVT map (Map A) using the HVT() on the -training dataset by setting -min_compression_perc to 70% and -quant.error to 0.2

  • -
  • Based on the z-scores from the output of the above step, we -identify the outlier cell(s) in the training dataset. For this dataset, -we identify the 10th, 53rd, 137th, and 198th cells as the outlier -cell.

  • -
  • We pass the identified outlier cell(s) as a parameter to the -removeOutliers() along with HVT Map A. The function removes that outlier -cell(s) from the dataset and stores them in another map called HVT Map -B. It also returns the dataset without outlier(s) along with Map B. -Here, the 53rd cell gets removed as the outlier cell

  • -
  • The plotCells() constructs hierarchical voronoi tessellations and -highlights the identified outlier cell(s) in red

  • -
  • The dataset without outlier(s) is then passed to the HVT() to -construct another HVT map (Map A). But here, we set the parameters -n_cells = 15, -depth = 2 etc. when constructing the -map.

  • -
  • Finally, the set of maps - Map A, Map B, and Map C are passed to -the mlayerHVT() along with the test dataset to predict which map and -what cell each test record is assigned to. The outlier cell(s) present -in the test dataset gets mapped to Map A and Map B whereas the -non-outlier cell(s) gets mapped to Map A and Map C.

  • -
  • In the test data, the 493rd, 753rd and 889th test record is -predicted as outlier record and is mapped to the 3rd outlier cell of the -HVT Map B. Similarly, the 550th and 1253rd test record is correctly -predicted as the outlier record and gets mapped to the 1st and 2nd cell -of the HVT Map B respectively.

  • -
-
- - - - - - - - - - - -