perlmike has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file converted from HTML. I need to format this file and output it into a standard size (8.5 X 11 inches), similar to the size of the regular MS word page, but still in text format. Below is my code. Any comments are greatly appreciated. Thank you!

use strict; use warnings; my($data, $input, $output, $newdata); $input = 'C:\Users\xxx\Documents\input.txt'; $output = 'C:\Users\xxx\Documents\output.txt'; open (INFILE, '<', $input); open (OUTFILE, '>', $output); local $/; $data = <INFILE>; print OUTFILE "$newdata"; close INFILE; close OUTFILE

Replies are listed 'Best First'.
Re: Output Format
by Polyglot (Chaplain) on Dec 12, 2022 at 04:00 UTC
    If you are able to do so, I would suggest reverting back to your HTML version of this text and using its clues to help you format the text-only version. Minimally, you would like to have paragraph information, as I expect you are not wishing to lump the entire body of text into a single, unbroken paragraph. If that is not possible, you may wish to insert manually some paragraph markers. Paragraphs could be "created" based on a set number of sentences per paragraph, or a certain number of lines as the maximum per paragraph, but this would mean awkward paragraph breaks, as a computer and a human will not read it the same.

    My next suggestion would be to use some LaTeX to format the text. It could be output to PDF. If you start with the HTML version, there may even be a tool (search for "HTML to LaTeX" online) that could do this without needing reinvent wheels in Perl. But even from the text alone, LaTeX could do wonders with formatting, and it can be set for virtually any paper size, with custom margins, font sizes, etc.

    Blessings,

    ~Polyglot~

      Seconding this; throwing away all the markup then expecting to be able to reproduce layout (which normally would be derived from said markup) is (to put it bluntly) inane. All you could hope for from the undistinguished one long line mess is to throw it through something like Text::Wrap and hope. If you had the HTML still you could try something like pandoc and see what it can come up with.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        Thank you for your suggestions!

      You are right. I will go back to the HTML files and go from there. Thank you for the input!

Re: Output Format
by perlmike (Novice) on Dec 12, 2022 at 03:21 UTC

    the text file is like a string without any newlines. <\p>

    Revenue RecognitionWe generate revenue by providing software as a serv +ice ("SaaS") solutions through on-demand subscription, on-premise per +petual and term licenses and related software maintenance, and servic +es. Amounts that have been invoiced are recorded in accounts receivab +le and in deferred revenue or revenue, depending on whether the reven +ue recognition criteria have been met. Recurring Revenue. Recurring r +evenue, which includes SaaS revenue and maintenance revenue, is recog +nized ratably over the stated contractual period. SaaS revenue consis +ts of subscription fees from customers accessing our cloud-based serv +ice offerings. Maintenance revenue consists of fees from customers pu +rchasing licenses and receiving support for such on-premise solutions +. We also recognize SaaS and maintenance revenue associated with cust +omers using our solutions in excess of contracted usage ("Overages"). + Overages are primarily attributed to SaaS products and are recorded +in SaaS revenue in the period incurred. Revenue related to Overages w +as immaterial for all years presented.Service and License Revenue. Se +rvice and license revenue primarily consists of services revenue rela +ted to training, integration and configuration services. Our professi +onal services arrangements are generally billed on a time-and-materia +ls basis. Time and material services are recognized as the services a +re rendered based on inputs to the project, such as billable hours in +curred. For fixed-fee professional services arrangements, we recogniz +e revenue under the proportional performance method of accounting and + estimates the proportional performance on a monthly basis, utilizing + hours incurred to date as a percentage of total estimated hours to c +omplete the project. If we do not have a sufficient basis to measure +progress toward completion, revenue is recognized upon completion. Se +rvice and license revenue also includes revenue from perpetual licens +es, which is recognized upon delivery of the product, using the resid +ual method, assuming all the other conditions for revenue recognition + have been met. Revenue related to perpetual licenses was immaterial + for all the years presented.In a limited number of arrangements with + non-standard acceptance criteria, we defer the revenue until the acc +eptance criteria are satisfied. Reimbursements, including those relat +ed to travel and out-of-pocket expenses, are included in services and + license revenue, and an equivalent amount of reimbursable expenses i +s included in cost of services and license revenue. In general, recur +ring revenue agreements are entered into for 12 to 36 months, and the + professional services are performed within nine months of entering i +nto a contract with the customer, depending on the size of integratio +n.Our SaaS agreements provide specified service level commitments, ex +cluding scheduled maintenance. The failure to meet this level of serv +ice availability may require us to credit qualifying customers a port +ion of their subscription and support fees. Based on our historical e +xperience meeting its service level commitments, we do not currently +have any liabilities on our consolidated balance sheets for these com +mitments.31We recognize revenue when all of the following conditions +are met: * Persuasive evidence of an arrangement exists;* Deli +very has occurred or services have been rendered;* The fees are fixed + or determinable; and * Collection of the fees is reasonably assur +ed. If we determine that any one of the four criteria is not met, we +will defer recognition of revenue until all the criteria are met.Mult +iple-deliverable arrangements with on-demand subscription. For on-dem +and subscription agreements with multiple deliverables, we evaluate e +ach element to determine whether it represents a separate unit of acc +ounting. We determine the best estimated selling price of each delive +rable in an arrangement based on a selling price hierarchy of methods + contained in Finance Accounting Standards Board ("FASB") Accounting +Standards Update ("ASU") No. 2009-13, Revenue Recognition (Accounting + Standards Codification ("ASC") Topic 605)-Multiple-Deliverable Reven +ue Arrangements. The best estimated selling price for a deliverable i +s based on its vendor-specific objective evidence ("VSOE"), if availa +ble, third-party evidence ("TPE"), if VSOE is not available, or estim +ated selling price ("ESP"), if neither VSOE nor TPE is available. Tot +al arrangement fees are allocated to each element using the relative +selling price method. We have currently established VSOE for most del +iverables, except for fixed fee service arrangements and on-premise s +oftware licenses. We considered all of the following factors to e +stablish the ESP for fixed fee service arrangements when sold with it +s on-demand services: the weighted average actual sales prices of pro +fessional services sold on a stand-alone basis for on-demand services +; average billing rates for fixed fee service agreements when sold wi +th on-demand services, cost plus a reasonable mark-up and other facto +rs such as gross margin objectives, pricing practices and growth stra +tegy. Multiple-deliverable arrangements with on-premise licen +se.