stockbr has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm a newbie to Perl and need help with a script or at least getting started. The reason I want to do this in perl is because these files at times can be rather larger.

I have 2 files with data (see below).

File1

5 field per line, each field is separated by a space.

The text field is 1 word.

# # # text #

11111111111111111111 22222222222 3333333333333333 text 44444

File 2

4 field per line, each field is separated by a comma.

The first 2 numbers are a range.

The last 2 text fields can have 1-4 words per field.

#,#,text,text

11111111111111,44444444444444,"text text","text text text"

I need to find where the number in file1/column3 (3333333333333333) falls into the range in file2/column 1 and 2 by matching left to right.

So for this example this is a match because 1 is < 3 and 4 is > 3.

Now I need to take the whole line in file1 append the 2 text fields in file 2 to the end of the line and send to output file.

End result in new output file:

11111111111111111111 22222222222 3333333333333333 DVL0005c 44444 “text text” “text text text"

Replies are listed 'Best First'.
Re: Please Help!
by toolic (Bishop) on May 24, 2016 at 17:15 UTC
Re: Please Help!
by ww (Archbishop) on May 24, 2016 at 18:43 UTC

    Also, please read about use of code tags for DATA as well as for code. As you've presented your data, we cannot readily distinguish between a space and a tab. See Markup in the Monastery.

    AFTER the http://perldoc.perl.org/perlintro.html toolic cited (and which is available from the command line on your own machine (eg: C:\ perldoc perintro<CR> on a Win machine or at your bash (etc) prompt on a *nix-ish box), assuming you've installed any standard Perl), you may wish to visit our well-stocked Tutorials, starting with those on reading from a file.

Re: Please Help!
by Marshall (Canon) on May 25, 2016 at 10:40 UTC
    You didn't give us much to work with. I'll try to give you a bit more direction on "how to get started" past reading the tut's.

    First, read Markup FAQ to help you format your posts. Of course "Help" as a title is not "helpful" to us.

    You don't say what if any programming experience that you have although your post alludes to the idea that you have some other way than Perl to solve this problem.

    In general, I would recommend a 3 stage approach:

    1. Write program to read file 1 and parse it correctly
    2. Write program to read file 2 and parse it correctly
    3. Then combine prog1 and prog2 and start making decisions
    A basic program #1 could go like this (of course untested as I have nothing to test with):
    #!/usr/bin/perl use warnings; use strict; open FILE1, '<', "file1name" or die "input1 error $!"; while (my $line = <FILE1>) { chomp $line; #remove line endings my ($name1,$name2,$name3,$name4,$name5) = split ' ',$line; #use some names that describe what you columns really #mean usually col1, col2 is a bad idea. # put some print statements here to make sure that # you can actually get the 5 individual things }
    The most simple version of program 2 is similar, except that since it is a CSV (Comma Separated Value) file, you need a different kind of split.
    my ($nameOfNumberfield1, $nameOfNumberfield1, $nameofTextField, $nameofTextfield) =split ',',$line;
    This simple approach won't work if there are embedded commas in the text, like: "Bob Smith, Jr.". I don't want to over complicate things if these more complex things are not needed. Program 2 will help you figure out whether this is an issue or not.

    Make a stab at the combined program 3 and let us know what issues you are having. If line 5 of input 1 always goes with line 5 of input 2, then this is much easier than a more general situation, but I can't tell from your description what the full requirements are.

    I hope that my advice on some of the basics: (a) open the file, (b)basic parsing can get you started? Read about file open and split. Good Luck on your journey, it will require a lot of work.