Anyone good with text files?

Soldato
Joined
16 Dec 2005
Posts
4,054
Location
Halifax W'Yorkshire
Howdy, im in need of some easy way to extract things from text files for example, i have 8000 lines which consist of things like:
{ 1, 47855, "", "-", "=ds=#s10#, #a1#"};
{ 2, 47857, "", "=q4", "=ds=#s3#, #a2#"};

etc, the part i need is the 5 didget number from each row.
Is there a way to extract this part from each row?

thanks
 
Soldato
Joined
18 Nov 2011
Posts
4,215
Location
London
First thought is that you could probably open it as a CSV in excel. Then the B column will contain the 5 digit numbers as both sides of it is separated by a comma. Whether or not that's how you would like it formatted, but it should isolate the just the numbers.
 
Soldato
Joined
18 Oct 2012
Posts
8,333
First thought is that you could probably open it as a CSV in excel. Then the B column will contain the 5 digit numbers as both sides of it is separated by a comma. Whether or not that's how you would like it formatted, but it should isolate the just the numbers.

you can also manually select the "width" of each column doing this, so you could extract your data that way assuming the 5 digit numbers all appear in the same place on each row.
 
Soldato
Joined
25 Jun 2011
Posts
5,468
Location
Yorkshire and proud of it!
Howdy, im in need of some easy way to extract things from text files for example, i have 8000 lines which consist of things like:
{ 1, 47855, "", "-", "=ds=#s10#, #a1#"};
{ 2, 47857, "", "=q4", "=ds=#s3#, #a2#"};

etc, the part i need is the 5 didget number from each row.
Is there a way to extract this part from each row?

thanks

Here is an easy way to get what you want:

Install Notepad++. (this will take only a moment).

Open the text file in it.

Hold down the LEFT ALT-KEY and drag the mouse cursor in a column over only the five digit numbers. When done, hit Copy (Ctrl-C) and then paste the copied numbers into a new text file in a new tab in Notepad++.

Note, I expect you will need to do this four times as your initial number presumably increments beyond single digits and thus the position of the five digit group will shift one to the right when you transition from units, to tens, to hundreds to thousands.

If that's no good, it's a fairly easy regular expression.
 
Soldato
Joined
25 Jun 2011
Posts
5,468
Location
Yorkshire and proud of it!
Can't check at the mo but would suggest Excel, paste data into column A then in column B, "=MID(A1,6,5)", fill down for all rows.

Just downloading excel now, will try these suggestions and report back, ty :)

This is unlikely to work as written because the moment your initial number grows beyond 9, the formula is no longer correct. Seriously, try Notepad++ and just highlight the numbers you want.

EDIT:

notepad.png
 
Last edited:
Soldato
Joined
3 Apr 2003
Posts
3,938
Location
InterZone
If your file always positions the 5 digit number in the same place, use awk to grab it and then sed to strip the comma:

[simon@clarknova ~]$ cat testfile | awk {'print $3'} | sed 's/,//g'
47855
47857
 
Soldato
Joined
25 Jun 2011
Posts
5,468
Location
Yorkshire and proud of it!
If your file always positions the 5 digit number in the same place, use awk to grab it and then sed to strip the comma:

[simon@clarknova ~]$ cat testfile | awk {'print $3'} | sed 's/,//g'
47855
47857

Ah, sed and awk. GNU's answer to the question: "how do I combine different tools on an OS too old to have object pipelining and inexcusably uses unnamed parameters." ;P
 

Pho

Pho

Soldato
Joined
18 Oct 2002
Posts
9,324
Location
Derbyshire
As mentioned Notepad++ and Regex will do it. It has a JSON plug you could probably use instead of Regex to extract what you need, but anyway:

Search > Find

Find what:{ ?(\d+), ?(\d+), ?\"(.*?)\", ?\"(.*?)\". ?\"(.*?)\"\};
Replace with: \1 for first column (1, 2)
Replace with: \2 for second column (47855, 47857)
Replace with: \3 for third column
Replace with: \4 for fourth column
Replace with: \4 for fifth column

Make sure you have search mode set to regular expression and click replace all.

You'll want to use \2 to extract the ID.

That's a crude Regex but it works :p
KqDgeQk.png
 
Soldato
Joined
25 Jun 2011
Posts
5,468
Location
Yorkshire and proud of it!
As mentioned Notepad++ and Regex will do it. It has a JSON plug you could probably use instead of Regex to extract what you need, but anyway:

Search > Find

Find what:{ ?(\d+), ?(\d+), ?\"(.*?)\", ?\"(.*?)\". ?\"(.*?)\"\};
Replace with: \1 for first column (1, 2)
Replace with: \2 for second column (47855, 47857)
Replace with: \3 for third column
Replace with: \4 for fourth column
Replace with: \4 for fifth column

Make sure you have search mode set to regular expression and click replace all.

You'll want to use \2 to extract the ID.

That's a crude Regex but it works :p
KqDgeQk.png

Nice that you gave a regex for every column and explained how to pick out a column of choice. I'm concerned by the OP's comment that they need to do this for "files". I hope the OP isn't planning to regularly do this. If so, I think something has gone wrong as the sample file is clearly meant to be read programmatically so the OP is working with an incomplete solution, I suspect.
 
Back
Top Bottom