Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone,

I am writing my thesis on the effect of negative publicity on executive pay. I need to collect the number of negative articles per company/CEO.

In a recent research they identified the articles (in Factiva) by using a Perl string sequence.

When I just fill in the Perl string in the search box I can't get the output of Company Name-CEO Name-#articles-Year (for a period of 10 years).

I have no knowledge of Perl programming, so I don't know if this is a complicated question.

I would really appreciate it if someone could steer me in the right direction because right now I don't know where to start.

Thank You!

Hannah

Perl string used in research "Power of the Pen and Executive Compensation":

(CEO name or CEO name’s or executive* or CEO*) near25

(high* near7 (salar* or bonus* or pay* or paid or compensat* or benefit*)

or excess* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or lofty near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or hefty near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or large adj7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or rich near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or big* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or outsize* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or huge near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or generous near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or exorbitant* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or fat* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or gargantuan near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or bonanza* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or jumbo near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or whopp* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or astound* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or ridiculous* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or stagger* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or handsome* near7 (salar* or bonus* or pay* or paid or compensat* or benefit* or option*)

or lucrative near7 (pay* or compensat* or option*)

or critic* near7 (pay* or compensat*)

or best near7 paid

or reap* adj7 million*

or self-serving

or largesse

or overpaid

or lavish

or perks

or perquisites

or windfall*

or earn* more than

or was paid more than

or receiv* more than

or made more than)

  • Comment on Data collection from Factiva using Perl string

Replies are listed 'Best First'.
Re: Data collection from Factiva using Perl string
by GotToBTru (Prior) on Sep 03, 2014 at 16:28 UTC

    I don't know who identified that format of searching as a "perl search string"; it is not recognizable to me as related to perl. Perhaps Factiva uses perl behind the scenes to do its searching. In particular, the near7 and near25 operators are not perl.

    Are you getting a response from Factiva for this search and need help organizing or picking out the particular data? It would be helpful to see what you're getting back.

    If your query is not returning the data you need, that is more of a question for Factiva than this forum.

    1 Peter 4:10
      Hi,

      Thank you so much for responding.

      The researchers in the paper identified it as "Perl String". It does indeed not look like the Perl tutorials I have read/watched.

      When I fill in the keywords I do get the output I need, but it is specified as #articles per Company per year. What I need is #articles per Company per CEO for 10 years.

      Could the authors of the paper maybe mean that it is possible to organize the outputdata with Perl.

      This is how the authors describe the Perl program:

      "To measure negative publicity about CEO compensation, we iteratively develop a Perl program to process the text of each article about CEO compensation to assess whether the article has a negative tone. The input into the Perl program consists of a set of negative tone keywords and phrases. This set of keywords and phrases was developed from manually reading approximately 200 articles about CEO compensation, where the articles included both randomly selected firms and firms widely known to have received negative publicity"