File tree 4 files changed +75
-0
lines changed
4 files changed +75
-0
lines changed Original file line number Diff line number Diff line change 1
1
.venv /
2
+ /model.pkl
Original file line number Diff line number Diff line change 1
1
/data.xml
2
2
/prepared
3
+ /features
Original file line number Diff line number Diff line change @@ -21,3 +21,47 @@ stages:
21
21
md5: 153aad06d376b6595932470e459ef42a.dir
22
22
size: 8437363
23
23
nfiles: 2
24
+ featurize:
25
+ cmd: python src/featurization.py data/prepared data/features
26
+ deps:
27
+ - path: data/prepared
28
+ hash: md5
29
+ md5: 153aad06d376b6595932470e459ef42a.dir
30
+ size: 8437363
31
+ nfiles: 2
32
+ - path: src/featurization.py
33
+ hash: md5
34
+ md5: e22789fc9581cad11ef7a6fa3aa3f17b
35
+ size: 4158
36
+ params:
37
+ params.yaml:
38
+ featurize.max_features: 100
39
+ featurize.ngrams: 1
40
+ outs:
41
+ - path: data/features
42
+ hash: md5
43
+ md5: f8f5cbc3188008a7542d02d63054d9d2.dir
44
+ size: 1556290
45
+ nfiles: 2
46
+ train:
47
+ cmd: python src/train.py data/features model.pkl
48
+ deps:
49
+ - path: data/features
50
+ hash: md5
51
+ md5: f8f5cbc3188008a7542d02d63054d9d2.dir
52
+ size: 1556290
53
+ nfiles: 2
54
+ - path: src/train.py
55
+ hash: md5
56
+ md5: 324001573ed724e5ae092226fcf9ca30
57
+ size: 1666
58
+ params:
59
+ params.yaml:
60
+ train.min_split: 0.01
61
+ train.n_est: 50
62
+ train.seed: 20170428
63
+ outs:
64
+ - path: model.pkl
65
+ hash: md5
66
+ md5: cfa72ff6e2575c44f78f423cada5b783
67
+ size: 1855075
Original file line number Diff line number Diff line change @@ -3,6 +3,14 @@ artifacts:
3
3
path : data/data.xml
4
4
type : dataset
5
5
desc : Initial XML StackOverflow dataset (raw data)
6
+ text-classification :
7
+ path : model.pkl
8
+ desc : Detect whether the given stackoverflow question should have R language tag
9
+ type : model
10
+ labels :
11
+ - nlp
12
+ - classification
13
+ - stackoverflow
6
14
stages :
7
15
prepare :
8
16
cmd : python src/prepare.py data/data.xml
@@ -14,3 +22,24 @@ stages:
14
22
- prepare.split
15
23
outs :
16
24
- data/prepared
25
+ featurize :
26
+ cmd : python src/featurization.py data/prepared data/features
27
+ deps :
28
+ - data/prepared
29
+ - src/featurization.py
30
+ params :
31
+ - featurize.max_features
32
+ - featurize.ngrams
33
+ outs :
34
+ - data/features
35
+ train :
36
+ cmd : python src/train.py data/features model.pkl
37
+ deps :
38
+ - data/features
39
+ - src/train.py
40
+ params :
41
+ - train.min_split
42
+ - train.n_est
43
+ - train.seed
44
+ outs :
45
+ - model.pkl
You can’t perform that action at this time.
0 commit comments