Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据集用户id、商品id等cutoff问题 #36

Open
shuDaoNan9 opened this issue Dec 6, 2019 · 1 comment
Open

数据集用户id、商品id等cutoff问题 #36

shuDaoNan9 opened this issue Dec 6, 2019 · 1 comment

Comments

@shuDaoNan9
Copy link

运行DCN模型跑下面这个数据集时候有些疑问:
http://labs.criteo.com/2014/02/download-kaggle-display-advertising-challenge-dataset/
Kaggle Display Advertising Challenge Dataset
我看里面数据格式是:
The columns are tab separeted with the following schema:
<integer feature 1> ... <integer feature 13> <categorical feature 1> ... <categorical feature 26>
并没有区分用户id、商品id,那这样如何给用户做推荐呢?而且我看get_criteo_feature.py处理的时候,很多categorical 类型数据直接被截断没了,那如何区分开用户呢?
parser.add_argument(
"--cutoff",
type=int,
default=200,
help="cutoff long-tailed categorical values"
)

谢谢!

@Ethan199111
Copy link

切断是为了控制ids类特征做embedding的长度, 让长尾的ID都索引到0的位置,如果你知道怎么用参数服务器处理大规模稀疏ID特征,也可以所以的都加入训练

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants