Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Fixed-width columns #224

Open
aborruso opened this issue Jan 31, 2019 · 9 comments
Open

[feature request] Fixed-width columns #224

aborruso opened this issue Jan 31, 2019 · 9 comments

Comments

@aborruso
Copy link
Contributor

Hi,
is there a way to manage this kind of text file

OC  Yea     NO  Yea     DE  Yea     WI  Yea     SP  Yea     SU  Yea     AU  Yea     AN  Year
3.3 1919   -0.5 1919   -3.0 1981  -3.07 1963   1.95 1962   8.66 1972   3.96 1952   3.99 1917
2.9 1917   -0.7 1915   -4.1 2010               1.83 1917   8.24 1922   3.40 1919   3.76 1919
9.4 2001    6.4 1994    5.2 2015   3.03 1989   5.17 2014  11.34 2003   7.97 2011   6.30 2014

If I use --inidx --ifs ' ' --repifs I have problems with the two null values.

It's a fixed width columns text file, but I do not know how to read it with mlr.

Thank you

@johnkerl
Copy link
Owner

johnkerl commented Feb 1, 2019

if there are no embedded spaces in any column values, use --ipprint. if there are though ... there's no fixed-width ingestor.

@aborruso
Copy link
Contributor Author

aborruso commented Feb 1, 2019

Hi @johnkerl ,
I had already tried with pprint but I have for that input mlr: Header-data length mismatch in file pprint at line 3. Is it ok?

thank you

@aborruso
Copy link
Contributor Author

aborruso commented Feb 1, 2019

And I have no error if I remove the line with empty "cells", it works with

OC  Yea     NO  Yea     DE  Yea     WI  Yea     SP  Yea     SU  Yea     AU  Yea     AN  Year
3.3 1919   -0.5 1919   -3.0 1981  -3.07 1963   1.95 1962   8.66 1972   3.96 1952   3.99 1917
9.4 2001    6.4 1994    5.2 2015   3.03 1989   5.17 2014  11.34 2003   7.97 2011   6.30 2014

@aborruso
Copy link
Contributor Author

aborruso commented Feb 3, 2019

I have find a solution using sed. The raw input file is this https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Tmin/ranked/UK.txt

My solution was to use sed in this way. At the end I have a CSV.

@johnkerl johnkerl changed the title fixed width columns fixed-width columns Feb 24, 2019
@johnkerl johnkerl changed the title fixed-width columns [NFR] Fixed-width columns Dec 31, 2021
@aborruso
Copy link
Contributor Author

aborruso commented Feb 9, 2022

@johnkerl what is NFR? Is it Non-Functional Requirements? What's the practical meaning?

Thank you

@johnkerl
Copy link
Owner

johnkerl commented Feb 9, 2022

@aborruso sorry "New Feature Request" -- some issues were already using this abbreviation & some weren't & in a fit of hubris I regularized some issue titles recently :^/

Really unnecessary on my part since there's now the feature-request tag ...

@johnkerl johnkerl changed the title [NFR] Fixed-width columns [feature request] Fixed-width columns Nov 27, 2022
@johnkerl johnkerl removed the wishlist label Jun 25, 2023
@moorereason
Copy link

I wanted to see if mlr could solve this without more features. Assumes a known, fixed-width record format.

Given input from original post (iss224.txt):

OC  Yea     NO  Yea     DE  Yea     WI  Yea     SP  Yea     SU  Yea     AU  Yea     AN  Year
3.3 1919   -0.5 1919   -3.0 1981  -3.07 1963   1.95 1962   8.66 1972   3.96 1952   3.99 1917
2.9 1917   -0.7 1915   -4.1 2010               1.83 1917   8.24 1922   3.40 1919   3.76 1919
9.4 2001    6.4 1994    5.2 2015   3.03 1989   5.17 2014  11.34 2003   7.97 2011   6.30 2014

A mlr script (fixed-width.mlr):

$1 = joinv([
    strip(substr($1, 0, 2)),
    strip(substr($1, 4, 7)),
    strip(substr($1, 9, 14)),
    strip(substr($1, 16, 19)),
    strip(substr($1, 21, 26)),
    strip(substr($1, 28, 31)),
    strip(substr($1, 33, 38)),
    strip(substr($1, 40, 43)),
    strip(substr($1, 45, 50)),
    strip(substr($1, 52, 55)),
    strip(substr($1, 57, 62)),
    strip(substr($1, 64, 68)),
    strip(substr($1, 69, 74)),
    strip(substr($1, 76, 79)),
    strip(substr($1, 88, 91)),
  ], ",")

Command:

mlr -N --otsv cat then put -f iss224.mlr iss224.txt | mlr --csv cat > iss224.out

Output (iss224.out):

OC,Yea,NO,Yea_2,DE,Yea_3,WI,Yea_4,SP,Yea_5,SU,Yea_6,AU,Yea_7,Year
3.3,1919,-0.5,1919,-3.0,1981,-3.07,1963,1.95,1962,8.66,1972,3.96,1952,1917
2.9,1917,-0.7,1915,-4.1,2010,,,1.83,1917,8.24,1922,3.40,1919,1919
9.4,2001,6.4,1994,5.2,2015,3.03,1989,5.17,2014,11.34,2003,7.97,2011,2014

Smoke test

$ mlr --csv --opprint cat iss224.out
OC  Yea  NO   Yea_2 DE   Yea_3 WI    Yea_4 SP   Yea_5 SU    Yea_6 AU   Yea_7 Year
3.3 1919 -0.5 1919  -3.0 1981  -3.07 1963  1.95 1962  8.66  1972  3.96 1952  1917
2.9 1917 -0.7 1915  -4.1 2010  -     -     1.83 1917  8.24  1922  3.40 1919  1919
9.4 2001 6.4  1994  5.2  2015  3.03  1989  5.17 2014  11.34 2003  7.97 2011  2014

@aborruso
Copy link
Contributor Author

Hi @moorereason for your input file

sed -r 's/ {6}/|/g;s/  */|/g' inp.txt | mlr --csv --ifs "|" cat

In output

OC,Yea,NO,Yea_2,DE,Yea_3,WI,Yea_4,SP,Yea_5,SU,Yea_6,AU,Yea_7,AN,Year
3.3,1919,-0.5,1919,-3.0,1981,-3.07,1963,1.95,1962,8.66,1972,3.96,1952,3.99,1917
2.9,1917,-0.7,1915,-4.1,2010,,,1.83,1917,8.24,1922,3.40,1919,3.76,1919
9.4,2001,6.4,1994,5.2,2015,3.03,1989,5.17,2014,11.34,2003,7.97,2011,6.30,2014

@moorereason
Copy link

I didn't notice the 6-character pattern. Duh. Here's a simpler version without the need for sed:

mlr -N --otsv cat then put '$1 = gsub(gsub($1, " {6}", ","), " +", ",")' iss224.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants