[feature request] Fixed-width columns #224

aborruso · 2019-01-31T23:17:38Z

Hi,
is there a way to manage this kind of text file

OC  Yea     NO  Yea     DE  Yea     WI  Yea     SP  Yea     SU  Yea     AU  Yea     AN  Year
3.3 1919   -0.5 1919   -3.0 1981  -3.07 1963   1.95 1962   8.66 1972   3.96 1952   3.99 1917
2.9 1917   -0.7 1915   -4.1 2010               1.83 1917   8.24 1922   3.40 1919   3.76 1919
9.4 2001    6.4 1994    5.2 2015   3.03 1989   5.17 2014  11.34 2003   7.97 2011   6.30 2014

If I use --inidx --ifs ' ' --repifs I have problems with the two null values.

It's a fixed width columns text file, but I do not know how to read it with mlr.

Thank you

The text was updated successfully, but these errors were encountered:

johnkerl · 2019-02-01T13:14:28Z

if there are no embedded spaces in any column values, use --ipprint. if there are though ... there's no fixed-width ingestor.

aborruso · 2019-02-01T13:26:00Z

Hi @johnkerl ,
I had already tried with pprint but I have for that input mlr: Header-data length mismatch in file pprint at line 3. Is it ok?

thank you

aborruso · 2019-02-01T14:16:45Z

And I have no error if I remove the line with empty "cells", it works with

OC  Yea     NO  Yea     DE  Yea     WI  Yea     SP  Yea     SU  Yea     AU  Yea     AN  Year
3.3 1919   -0.5 1919   -3.0 1981  -3.07 1963   1.95 1962   8.66 1972   3.96 1952   3.99 1917
9.4 2001    6.4 1994    5.2 2015   3.03 1989   5.17 2014  11.34 2003   7.97 2011   6.30 2014

aborruso · 2019-02-03T17:55:33Z

I have find a solution using sed. The raw input file is this https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Tmin/ranked/UK.txt

My solution was to use sed in this way. At the end I have a CSV.

aborruso · 2022-02-09T23:06:51Z

@johnkerl what is NFR? Is it Non-Functional Requirements? What's the practical meaning?

Thank you

johnkerl · 2022-02-09T23:12:05Z

@aborruso sorry "New Feature Request" -- some issues were already using this abbreviation & some weren't & in a fit of hubris I regularized some issue titles recently :^/

Really unnecessary on my part since there's now the feature-request tag ...

moorereason · 2025-01-16T03:39:22Z

I wanted to see if mlr could solve this without more features. Assumes a known, fixed-width record format.

Given input from original post (iss224.txt):

OC  Yea     NO  Yea     DE  Yea     WI  Yea     SP  Yea     SU  Yea     AU  Yea     AN  Year
3.3 1919   -0.5 1919   -3.0 1981  -3.07 1963   1.95 1962   8.66 1972   3.96 1952   3.99 1917
2.9 1917   -0.7 1915   -4.1 2010               1.83 1917   8.24 1922   3.40 1919   3.76 1919
9.4 2001    6.4 1994    5.2 2015   3.03 1989   5.17 2014  11.34 2003   7.97 2011   6.30 2014

A mlr script (fixed-width.mlr):

$1 = joinv([
    strip(substr($1, 0, 2)),
    strip(substr($1, 4, 7)),
    strip(substr($1, 9, 14)),
    strip(substr($1, 16, 19)),
    strip(substr($1, 21, 26)),
    strip(substr($1, 28, 31)),
    strip(substr($1, 33, 38)),
    strip(substr($1, 40, 43)),
    strip(substr($1, 45, 50)),
    strip(substr($1, 52, 55)),
    strip(substr($1, 57, 62)),
    strip(substr($1, 64, 68)),
    strip(substr($1, 69, 74)),
    strip(substr($1, 76, 79)),
    strip(substr($1, 88, 91)),
  ], ",")

Command:

mlr -N --otsv cat then put -f iss224.mlr iss224.txt | mlr --csv cat > iss224.out

Output (iss224.out):

OC,Yea,NO,Yea_2,DE,Yea_3,WI,Yea_4,SP,Yea_5,SU,Yea_6,AU,Yea_7,Year
3.3,1919,-0.5,1919,-3.0,1981,-3.07,1963,1.95,1962,8.66,1972,3.96,1952,1917
2.9,1917,-0.7,1915,-4.1,2010,,,1.83,1917,8.24,1922,3.40,1919,1919
9.4,2001,6.4,1994,5.2,2015,3.03,1989,5.17,2014,11.34,2003,7.97,2011,2014

Smoke test

$ mlr --csv --opprint cat iss224.out
OC  Yea  NO   Yea_2 DE   Yea_3 WI    Yea_4 SP   Yea_5 SU    Yea_6 AU   Yea_7 Year
3.3 1919 -0.5 1919  -3.0 1981  -3.07 1963  1.95 1962  8.66  1972  3.96 1952  1917
2.9 1917 -0.7 1915  -4.1 2010  -     -     1.83 1917  8.24  1922  3.40 1919  1919
9.4 2001 6.4  1994  5.2  2015  3.03  1989  5.17 2014  11.34 2003  7.97 2011  2014

aborruso · 2025-01-16T07:11:22Z

Hi @moorereason for your input file

sed -r 's/ {6}/|/g;s/  */|/g' inp.txt | mlr --csv --ifs "|" cat

In output

OC,Yea,NO,Yea_2,DE,Yea_3,WI,Yea_4,SP,Yea_5,SU,Yea_6,AU,Yea_7,AN,Year
3.3,1919,-0.5,1919,-3.0,1981,-3.07,1963,1.95,1962,8.66,1972,3.96,1952,3.99,1917
2.9,1917,-0.7,1915,-4.1,2010,,,1.83,1917,8.24,1922,3.40,1919,3.76,1919
9.4,2001,6.4,1994,5.2,2015,3.03,1989,5.17,2014,11.34,2003,7.97,2011,6.30,2014

moorereason · 2025-01-16T15:58:16Z

I didn't notice the 6-character pattern. Duh. Here's a simpler version without the need for sed:

mlr -N --otsv cat then put '$1 = gsub(gsub($1, " {6}", ","), " +", ",")' iss224.txt

aborruso mentioned this issue Feb 3, 2019

how to work with duplicated field names? #225

Closed

johnkerl changed the title ~~fixed width columns~~ fixed-width columns Feb 24, 2019

johnkerl added the wishlist label Feb 24, 2019

johnkerl changed the title ~~fixed-width columns~~ [NFR] Fixed-width columns Dec 31, 2021

johnkerl added the feature-request label Jan 1, 2022

johnkerl added the help wanted label Feb 9, 2022

johnkerl changed the title ~~[NFR] Fixed-width columns~~ [feature request] Fixed-width columns Nov 27, 2022

johnkerl removed the wishlist label Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Fixed-width columns #224

[feature request] Fixed-width columns #224

aborruso commented Jan 31, 2019

johnkerl commented Feb 1, 2019

aborruso commented Feb 1, 2019

aborruso commented Feb 1, 2019

aborruso commented Feb 3, 2019

aborruso commented Feb 9, 2022

johnkerl commented Feb 9, 2022

moorereason commented Jan 16, 2025

aborruso commented Jan 16, 2025

moorereason commented Jan 16, 2025

[feature request] Fixed-width columns #224

[feature request] Fixed-width columns #224

Comments

aborruso commented Jan 31, 2019

johnkerl commented Feb 1, 2019

aborruso commented Feb 1, 2019

aborruso commented Feb 1, 2019

aborruso commented Feb 3, 2019

aborruso commented Feb 9, 2022

johnkerl commented Feb 9, 2022

moorereason commented Jan 16, 2025

aborruso commented Jan 16, 2025

moorereason commented Jan 16, 2025