Fshah has asked for the wisdom of the Perl Monks concerning the following question:
Hi there, I have a large log file in .txt format which has several pages (each page has different format,with different colums pertaining to different fields) I want to separate each column into an array so that it can be used to process later ,
place and year data: 67 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |no.| name | age | place | year | |_ _|_ _ _ _|_ _ _ | _ _ _ | _ _ | |1 | sue |33 | NY | 2015 | |2 | mark |28 | cal | 2106 | work and language :65 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |no.| name | languages | proficiency | time taken| |_ _| _ _ _| _ _ _ _ _ |_ _ _ _ _ _ _| _ _ _ _ _ | |1 | eliz | English | good | 24 hrs | |2 | susan| Spanish | good | 13 hrs | |3 | danny| Italian | decent | 21 hrs | Position log | | |Pos |value | |bulk|lot| prev| newest| |# |Locker|(dfg) |(no) |nul|val |Id | val |val | ----------------------------------------------------------- | 0| 1| 302832| -11.88| 1| 0|Pri| 16| 0| | 1| 9| 302836| 11.88| 9| 0|Pri| 10| 0| | 2| 1| 302832| -11.88| 5| 3|Pri| 14| 4| | 3| 3| 302833| 11.88| 1| 0|sec| 12| 0| | 4| 6| 302837| -11.88| 1| 0|Pri| 16| 3|
I want these columns into an array with name as given in the table containing values as in the table. thank you.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing .txt into arrays
by hippo (Archbishop) on May 24, 2017 at 09:51 UTC | |
There seem to be three tasks here: Which of these is giving you trouble? What have you tried? How did it fail? An SSCCE is always welcome. | [reply] |
by Fshah (Initiate) on May 25, 2017 at 04:24 UTC | |
I'm still stuck on 1 ,firstly I want to spit up this txt file into pages (each page pertaining to specific format),as similar pages have same keywords eg: work and language table always has the same format (same columns ,although not the same number of rows),I want to search these tables in the entire txt file using the key word "work and languages" and populate each work and languages table into different arrays,How can I go about this? in this entire description I've used pages and tables interchangeably . | [reply] |
by huck (Prior) on May 25, 2017 at 07:07 UTC | |
So lets focus on hippo's step 1 first, since it is the key to the next steps In some ways this is easy, and in other ways it will be hard If i was to do this based on what you have shown us i would start with a base you have already identified, in one case the a table begins with "work and language :65" , another table begins with "place and year data: 67" and yet another with "Position log". This takes that concept, and uses whats sometimes called a state machine to separate the lines into table parts, i then kept going to parse all the data into a hash of arrays of hashs. I realize its not quite the output style you wanted but it shows a lot of the techniques and you could modify it to get what you want. Result
| [reply] [d/l] [select] |
|
Re: Parsing .txt into arrays
by Eily (Monsignor) on May 24, 2017 at 09:42 UTC | |
Sample data would have been welcome. See How do I post a question effectively?. Either your columns are separated by some given delimiter (eg: comma), in which case you can use Text::CSV, or your columns are aligned on a given position, in which case unpack can do the trick (although there might be a module that does it better)
| [reply] [d/l] |
|
Re: Parsing .txt into arrays
by Marshall (Canon) on May 25, 2017 at 08:53 UTC | |
Each table has 3 parts: 1)the name of the table, 2)the column definitions of the table, and 3)the data for each row in the table. All 3 of your example tables have these same 3 parts. The code below cycles through 3 states: So anyway, the thinking goes: if we are in Phase 1,2,3 and the line that we just read means that the current Phase has not ended, then we process the current line for the current Phase. Otherwise, the current Phase ends, any "clean-up" is done and the overall state transistions to the next Phase. Below, the state transitions are 1->2->3->1->2->3->1, etc. I wrote the code and it worked for the first 2 tables, then I found out that something was odd about table 3. So I used a special technique in Perl to code an exception to the rule of what ends the "finding Table Name" phase. In Perl the "redo" statement restarts the while (condition){...} loop without re-evaluating the condition. In this case, we see that the COL_NAMES phase has already started. So I just adjust the Phase or State to be 'GET_COL_NAMES' and restart the loop without reading another line. There are of course other ways of accomplishing this same goal. This techniqe just happened to surface at the moment. I didn't worry about tweaking the splits or regex'es. Often this just doesn't matter as disk I/O is usually the slowest part. The main thing I wanted to show in this post was a method to section the code into easy to identifiable states or phases. Some details of how each 'state' is handled could be different, but that is not my main point.
| [reply] [d/l] |
by Fshah (Initiate) on May 26, 2017 at 05:53 UTC | |
approach: say every position log table has extension Fp379 and all pages start with the year ,I want to use these keywords%Fp379 for position log page and 2017 for all pages % to separate the required tables like wise similar pages start with same extension.
| [reply] [d/l] |
by roboticus (Chancellor) on May 26, 2017 at 12:50 UTC | |
The $. variable contains the line number last read from the last filehandle access. So you can simply store the value each time you start a new array and add the starting line number to your table of information. You can read more about the $. variable at perlvar. ...roboticus When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] |
by Marshall (Canon) on May 26, 2017 at 19:28 UTC | |
Ok, for these extra requirements, I modified the GET_NAME state to allow for multi-line names instead of just keeping the last non-blank line before the table starts. Keeping track of the line numbers from the original file sounds weird, but I added that info to the $name record using $., the current file handle's current line number. I would recommend just letting the code parse out each table that it encounters. At the finish_current_table() subroutine, make a decision of whether or not you want to actually keep the current table or not? I just hard coded a regex for /2017.*?Fp379/ but of course this could be more flexible. Note that to "keep" the table, I added it to a @results data structure, which I "dumped" right before the program ends. I would presume that in the "real code", instead of adding to the @results structure, some export() function is called to put the table into a DB or make a discrete file in some sort of CSV format? I did not generate strictly conformant CSV (multi-word strings should be quoted). From the size of the input file you are describing, it sounds to me like putting these tables into a SQL DB is the right way to go. The Perl DBI is fantastic. Code: Read more... (6 kB) | [reply] [d/l] [select] |
by Fshah (Initiate) on Jun 02, 2017 at 09:39 UTC | |
by Marshall (Canon) on Jun 04, 2017 at 01:44 UTC | |
| |
|
Re: Parsing .txt into arrays
by karthiknix (Sexton) on May 24, 2017 at 10:44 UTC | |
- First collect data of one page and insert into an array with new line character as delimiter. Likewise you must have number of arrays equal to number of pages in Log file. - use map function to traverse each array and use pattern matching to delimit a column value and push into an array using regex special variable. map{if(/(.*)<Column Delimiter>(.*)/ig){<your code to assign each value into an array>}}@arr- Name your arrays for each column based on its value for you to identify it later part of the code. Now you have sorted list of data in arrays. | [reply] [d/l] |
|
Re: Parsing .txt into arrays
by thanos1983 (Parson) on May 24, 2017 at 11:24 UTC | |
Hello Fshah Welcome to the monastery gates and into the beautiful world of Perl. Given from your sample of data this looks like DB data. Are you sure you want to export them into text files? Through a very simple script you can export only the required fields and data from your database. Is this something that you would like? It is very common that all of us we are trying to do something that some times is more complicated, try to describe in simple words how are you data populated into your txt files, is it through a Data Base? If this is the case then you should not be doing something like that. :D Just describe what you are trying to do, Monks here are very friendly and very very experienced, they/we will try to assist as much as possible. Looking forward to your update. Hope this helps.
Seeking for Perl wisdom...on the process of learning...not there...yet!
| [reply] [d/l] [select] |
by Fshah (Initiate) on May 25, 2017 at 04:13 UTC | |
this is a sample data I made up looking at the format of the original data, the original data is simply a txt file from unknown source so all I have is this txt file and I want to populate my arrays with name as the heading appearing in the table and the contents as the columns pertaining to the heading ,eg: for work and language table, I want an array(with name "languages")and its contents as
| [reply] [d/l] |
|
Re: Parsing .txt into arrays
by Anonymous Monk on May 24, 2017 at 17:58 UTC | |
| [reply] |
by Fshah (Initiate) on May 25, 2017 at 04:14 UTC | |
| [reply] |