Jan. 6th, 2008

sraun: portrait (Default)
I have 208 HTML files. I need to find the first occurrence of text between H1 Tags - like so:

<H1 ALIGN=CENTER>
sample text
</H1>


and then drop the text between the TITLE tags in the HEAD region. Yes, the sample text I need to grab is always on the line after the first H1 tag, and is always the only text on that line. The H1 tag is always early in the BODY region. I would love to automate this - I've got Perl, Python, and the standard Unix command-line text processing tools.

Anyone have any suggestions, magic invocations, or whatever? I know this can be done in Perl, probably fairly easily - but I don't do enough Perl to write it myself, and I can't conceptualize how to make the processing go backwards using the standard Unix tools.

Profile

sraun: portrait (Default)
sraun

November 2025

S M T W T F S
       1
2345678
9101112131415
16171819202122
23242526272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Nov. 9th, 2025 03:51 pm
Powered by Dreamwidth Studios