ISO Text Processing Help
Jan. 6th, 2008 03:16 pmI have 208 HTML files. I need to find the first occurrence of text between H1 Tags - like so:
and then drop the text between the TITLE tags in the HEAD region. Yes, the sample text I need to grab is always on the line after the first H1 tag, and is always the only text on that line. The H1 tag is always early in the BODY region. I would love to automate this - I've got Perl, Python, and the standard Unix command-line text processing tools.
Anyone have any suggestions, magic invocations, or whatever? I know this can be done in Perl, probably fairly easily - but I don't do enough Perl to write it myself, and I can't conceptualize how to make the processing go backwards using the standard Unix tools.
<H1 ALIGN=CENTER>
sample text
</H1>
and then drop the text between the TITLE tags in the HEAD region. Yes, the sample text I need to grab is always on the line after the first H1 tag, and is always the only text on that line. The H1 tag is always early in the BODY region. I would love to automate this - I've got Perl, Python, and the standard Unix command-line text processing tools.
Anyone have any suggestions, magic invocations, or whatever? I know this can be done in Perl, probably fairly easily - but I don't do enough Perl to write it myself, and I can't conceptualize how to make the processing go backwards using the standard Unix tools.