If I wanted to scrape a website like this, I would definitely use JScript with Windows to automate the process. I would have the script open the website and press Ctrl+A to select all and then copy and read the text from the clipboard and save it (see working example below). Then once saved, another script would parse it and remove unnecessary stuff and leave only what I want. OR I would explore that website a little more to see if they offer a Printer friendly page or a mobile version of the site or have an API that lets you get the results quick. At first glance, it looks like the website does have a mobile version which uses no JavaScript. So, I would just use wget to download it and then parse the source code. That would be easiest, I think. Here is a link to one of the mobile pages of this site. As you can see, it is pretty simple to grab the content: https://www.flashscore.mobi/standings/0bL6Acw6/thxTanK4/

Here is a working program that downloads a snapshot of the page on Windows. Download this code and save as "Scraper.js" on your desktop:

URL = "https://www.flashscore.com/football/england/premier-league-2018 +-2019/results/"; // <= WEBSITE YOU WANT TO SCRAPE WEB_BROWSER = "C:\\BIN\\SUPERMIUM\\chrome.exe"; // <= PUT YOUR WEB +BROWSER'S FULL NAME AND PATH HERE WEB_BROWSER_NAME = "Supermium"; // <= PUT YOUR WEB +BROWSER'S SHORT NAME HERE try { FSO = new ActiveXObject("Scripting.FileSystemObject"); WshShell = WScript.CreateObject("WScript.Shell"); } catch (e) { ABORT("The script cannot access the file system!"); } ContinueOrExit("This script will try to open the web browser " + WEB_BROWSER_NAME + ", load a website and\n" + "copy the contents and save it in a file called Scraper.txt."); ALERT("This program will run for approx. 20 seconds after\nyou press t +he OK button, and then\nit will terminate and say \"done\"."); RUN(QUOTE(WEB_BROWSER) + " " + QUOTE(URL)); WAIT(5); WINFOCUS( WEB_BROWSER_NAME ); WAIT(10); // WAIT FOR PAGE TO FULLY LOAD. GIVE IT ABOUT 10 + SECONDS. PRESS("{ESCAPE}"); // SEND ESC KEYSTROKE TO CLOSE POPUP AD ON THE W +EBSITE WAIT(0.2); PRESS("^{a}"); // SEND CTRL+A KEYSTROKE (SELECT ALL) WAIT(0.05); PRESS("^{a}"); // SEND CTRL+A KEYSTROKE AGAIN JUST IN CASE WAIT(0.2); PRESS("^{INSERT}"); // SEND CTRL+INSERT KEYSTROKE (COPY TEXT) WAIT(0.2); TEXT = ReadClipboardText(); CURDIR = GetCurDir(); // GET CURRENT DIRECTORY // Note: When you save using CreateFile(), this will discard ALL // unicode characters from the TEXT. If you want to save Unicode // characters in the output, then use CreateUnicodeFile() // function instead: // Here we remove everything that appears BEFORE the first occurrence +of "Premier League" // We only keep the text that appears after this text: TEXT = StrAfter(TEXT, "Premier League"); // And here we remove everything that appears AFTER the last occurrenc +e of "Show more matches" // We only keep whatever we have before this text: TEXT = StrBefore(TEXT, "Show more matches"); // So, we're left with the results, and we save that in a text file: CreateFile(CURDIR + "Scraper.txt", TEXT); SAY("done"); EXIT(); // END OF SCRIPT //////////////////////////////////////////////////////////////////// // // F U N C T I O N S // // This function displays a simple OK popup message box. function ALERT(MSG) { WScript.Echo(MSG); } // Display an error message and terminate the script. function ABORT(MSG) { EXIT("Error: " + MSG); } // Display a message and terminate the script. function EXIT(MSG) { if (typeof(MSG) != "undefined") ALERT(MSG); WScri +pt.Quit(0); } // Show a popup box, display a Yes/No question, and return the user's +response. function YES(QUESTION, TITLE) { try { if (WshShell.Popup(QUESTION, 0, +(typeof(TITLE) == "string" ? TITLE : "Question"), 36) == 6) return 1; + } catch (e) {} return 0; } // Asks whether to continue the script or not. If user selects no, the +n terminates the script. function ContinueOrExit(QUESTION) { if (!YES(QUESTION + "\n\nContinue? +", "Do you want to continue?")) EXIT(); } // This function returns the current directory with a backslash at the + end. function GetCurDir() { return WScript.ScriptFullName.replace(/\\[^\\]+ +$/, "\\"); } // This function will run a separate program which will be independent + from this script. function RUN(CMD) { WshShell.Run(CMD, 9); } // This function will transfer focus to the named application. This fu +nction does not work in every situation. // If it doesn't work, try to substitute it by sending ALT+TAB keycode +, which will bring into focus // the other running app. Make sure there are no other windows open! function WINFOCUS(W) { WshShell.AppActivate(W); } // This script will wait for N seconds and then continue execution of +the script. function WAIT(N) { WScript.Sleep(N * 1000); } // This function will use the builtin text-to-speech engine of Windows + to say a sentence in English. function SAY(TEXT) { var VOICE = WScript.CreateObject("SAPI.SpVoice"); + VOICE.Volume = 100; VOICE.Speak(TEXT); } // This function sends a keypress to the currently running app. function PRESS(KEYCODE) { WAIT(0.1); WshShell.SendKeys(KEYCODE); } // This function inserts double quotes around a string. function QUOTE(S) { return '"' + S + '"'; } // This function returns a section of string S that comes before the L +AST occurrence of string M. The match is case sensitive. function StrBefore(S, M) { var E = S.lastIndexOf(M); return (E > 0) ? +S.substr(0, E) : ""; } // This function returns a section of string S that comes after the FI +RST occurrence of string M. The match is case sensitive. function StrAfter(S, M) { var E = S.indexOf(M); return (E > 0) ? S.sli +ce(E + M.length) : ""; } //////////////////////////////////////////////////////////////////// // // This function creates and overwrites a file in plain text ASCII mod +e. // Return 1 on success or 0 if an error occurred. // // Usage: INTEGER = CreateFile(FILENAME, STRING) // function CreateFile(FILENAME, S) { S += ""; // Ensure that no errors will occur during write: // Here we convert certain special characters which would cause // a "hiccup" with the Write() function when we have the // file open in plain text ASCII mode. S = S.replace(/[^\x00-\xFF]+/g, ""); S = S.replace(/[\x80-\x9F]{1}/g, function (c, x) { return "\u20AC\x8 +1\u201A\u0192\u201E\u2026\u2020\u2021\u02C6\u2030\u0160\u2039\u0152\x +8D\u017D\x8F\x90\u2018\u2019\u201C\u201D\u2022\u2013\u2014\u02DC\u212 +2\u0161\u203A\u0153\x9D\u017E\u0178".charAt(x - 128); }); try { var FILE = FSO.CreateTextFile(FILENAME, 1, 0); // Create plain A +SCII file. FILE.Write(S); FILE.Close(); return 1; } catch (e) {} return 0; } //////////////////////////////////////////////////////////////////// // // This function creates and overwrites a file with a string that // may contain Unicode characters. The file will be saved in // UTF-16BE format! The function returns 1 on success or // zero if an error occurred. // // Usage: INTEGER = CreateUnicodeFile(FILENAME, STRING) // function CreateUnicodeFile(FILENAME, S) { try { var FILE = FSO.CreateTextFile(FILENAME, 1, 1); // Create UTF16 fi +le FILE.Write(S); FILE.Close(); return 1; } catch (e) {} return 0; } //////////////////////////////////////////////////////////////////// // // This function copies all text from the clipboard. We accomplish // this by launching Internet Explorer in the background. We create // a document with a textbox for user input. We paste whatever is // on the clipboard into the textarea, and then we read its content // and return the string. This process takes about 200ms. // // Usage: STRING = ReadClipboardText() // function ReadClipboardText() { var MSIE = WScript.CreateObject("InternetExplorer.Application"); MSIE.Visible = 0; MSIE.Navigate("about:blank"); var HTML = "<HTML><HEAD><TITLE></TITLE></HEAD><BODY onLoad='Init();' + SCROLL=NO><FORM NAME=MAIN><TEXTAREA NAME=INPUT COLS=20 ROWS=5></TEXT +AREA><SCRIPT> function Init() { document.MAIN.INPUT.focus(); } </SCRI +PT>"; MSIE.Document.open(); MSIE.Document.write(HTML); MSIE.Document.close(); WScript.Sleep(100); PRESS("+{INSERT}"); WScript.Sleep(60); var TEXT = MSIE.Document.MAIN.INPUT.value; MSIE.Quit(); return TEXT; } ////////////////////////////////////////////////////////////////////

And here is the output that it produces:

2018/2019 Advertisement SCORES NEWS LOGIN Advertisement Advertisement FOOTBALL ENGLAND Premier League Premier League 2018/2019 SUMMARY NEWS RESULTS FIXTURES STANDINGS ARCHIVE ENGLAND : Premier League Standings ROUND 38 12.05. 08:00 Brighton Brighton Manchester City Manchester City 1 4 12.05. 08:00 Burnley Burnley Arsenal Arsenal 1 3 12.05. 08:00 Crystal Palace Crystal Palace Bournemouth Bournemouth 5 3 12.05. 08:00 Fulham Fulham Newcastle Newcastle 0 4 12.05. 08:00 Leicester Leicester Chelsea Chelsea 0 0 12.05. 08:00 Liverpool Liverpool Wolves Wolves 2 0 12.05. 08:00 Manchester Utd Manchester Utd Cardiff Cardiff 0 2 12.05. 08:00 Southampton Southampton Huddersfield Huddersfield 1 1 12.05. 08:00 Tottenham Tottenham Everton Everton 2 2 12.05. 08:00 Watford Watford West Ham West Ham 1 4 ROUND 37 06.05. 13:00 Manchester City Manchester City Leicester Leicester 1 0 05.05. 09:30 Arsenal Arsenal Brighton Brighton 1 1 05.05. 07:00 Chelsea Chelsea Watford Watford 3 0 05.05. 07:00 Huddersfield Huddersfield Manchester Utd Manchester Utd 1 1 04.05. 12:45 Newcastle Newcastle Liverpool Liverpool 2 3 04.05. 10:30 Cardiff Cardiff Crystal Palace Crystal Palace 2 3 04.05. 08:00 West Ham West Ham Southampton Southampton 3 0 04.05. 08:00 Wolves Wolves Fulham Fulham 1 0 04.05. 05:30 Bournemouth Bournemouth Tottenham Tottenham 2 1 0 03.05. 13:00 Everton Everton Burnley Burnley 2 0 ROUND 36 28.04. 09:30 Manchester Utd Manchester Utd Chelsea Chelsea 1 1 28.04. 07:05 Burnley Burnley Manchester City Manchester City 0 1 28.04. 05:00 Leicester Leicester Arsenal Arsenal 3 0 27.04. 10:30 Brighton Brighton Newcastle Newcastle 1 1 27.04. 08:00 Crystal Palace Crystal Palace Everton Everton 0 0 27.04. 08:00 Fulham Fulham Cardiff Cardiff 1 0 27.04. 08:00 Southampton Southampton Bournemouth Bournemouth 3 3 27.04. 08:00 Watford Watford Wolves Wolves 1 2 27.04. 05:30 Tottenham Tottenham West Ham West Ham 0 1 26.04. 13:00 Liverpool Liverpool Huddersfield Huddersfield 5 0 ROUND 31 24.04. 13:00 Manchester Utd Manchester Utd Manchester City Manchester City 0 2 24.04. 12:45 Wolves Wolves Arsenal Arsenal 3 1 ROUND 33 23.04. 12:45 Tottenham Tottenham Brighton Brighton 1 0 ROUND 31 23.04. 12:45 Watford Watford Southampton Southampton 1 1 ROUND 35 22.04. 13:00 Chelsea Chelsea Burnley Burnley 2 2 21.04. 09:00 Arsenal Arsenal Crystal Palace Crystal Palace 2 3 21.04. 09:00 Cardiff Cardiff Liverpool Liverpool 0 2 21.04. 06:30 Everton Everton Manchester Utd Manchester Utd 4 0 20.04. 10:30 Newcastle Newcastle Southampton Southampton 3 1 20.04. 08:00 Bournemouth Bournemouth Fulham Fulham 0 1 20.04. 08:00 Huddersfield Huddersfield Watford Watford 1 2 20.04. 08:00 West Ham West Ham Leicester Leicester 2 2 20.04. 08:00 Wolves Wolves Brighton Brighton 0 0 20.04. 05:30 Manchester City Manchester City Tottenham Tottenham 1 0 ROUND 31 16.04. 12:45 Brighton Brighton Cardiff Cardiff 0 2 ROUND 34 15.04. 13:00 Watford Watford Arsenal Arsenal 0 1 14.04. 09:30 Liverpool Liverpool Chelsea Chelsea 2 0 14.04. 07:05 Crystal Palace Crystal Palace Manchester City Manchester City 1 3 13.04. 10:30 Manchester Utd Manchester Utd West Ham West Ham 2 1 13.04. 08:00 Brighton Brighton Bournemouth Bournemouth 0 5 13.04. 08:00 Burnley Burnley Cardiff Cardiff 2 0 13.04. 08:00 Fulham Fulham Everton Everton 2 0 13.04. 08:00 Southampton Southampton Wolves Wolves 3 1 13.04. 05:30 Tottenham Tottenham Huddersfield Huddersfield 4 0 12.04. 13:00 Leicester Leicester Newcastle Newcastle 0 1 ROUND 33 08.04. 13:00 Chelsea Chelsea West Ham West Ham 2 0 07.04. 07:05 Everton Everton Arsenal Arsenal 1 0 06.04. 08:00 Bournemouth Bournemouth Burnley Burnley 1 3 06.04. 08:00 Huddersfield Huddersfield Leicester Leicester 1 4 06.04. 08:00 Newcastle Newcastle Crystal Palace Crystal Palace 0 1 05.04. 13:00 Southampton Southampton Liverpool Liverpool 1 3 ROUND 27 03.04. 12:45 Chelsea Chelsea Brighton Brighton 3 0 ROUND 33 03.04. 12:45 Manchester City Manchester City Cardiff Cardiff 2 0 ROUND 31 03.04. 12:45 Tottenham Tottenham Crystal Palace Crystal Palace 2 0 ROUND 33 02.04. 12:45 Watford Watford Fulham Fulham 4 1 02.04. 12:45 Wolves Wolves Manchester Utd Manchester Utd 2 1 ROUND 32 01.04. 13:00 Arsenal Arsenal Newcastle Newcastle 2 0 31.03. 09:30 Liverpool Liverpool Tottenham Tottenham 2 1 31.03. 07:05 Cardiff Cardiff Chelsea Chelsea 1 2 30.03. 11:30 West Ham West Ham Everton Everton 0 2 30.03. 09:00 Brighton Brighton Southampton Southampton 0 1 30.03. 09:00 Burnley Burnley Wolves Wolves 2 0 30.03. 09:00 Crystal Palace Crystal Palace Huddersfield Huddersfield 2 0 30.03. 09:00 Leicester Leicester Bournemouth Bournemouth 2 0 30.03. 09:00 Manchester Utd Manchester Utd Watford Watford 2 1 30.03. 06:30 Fulham Fulham Manchester City Manchester City 0 2 ROUND 31 17.03. 10:30 Everton Everton Chelsea Chelsea 2 0 17.03. 08:15 Fulham Fulham Liverpool Liverpool 1 2 16.03. 09:00 Bournemouth Bournemouth Newcastle Newcastle 2 2 16.03. 09:00 Burnley Burnley Leicester Leicester 1 2 16.03. 09:00 West Ham West Ham Huddersfield Huddersfield 4 3 ROUND 30 10.03. 10:30 Arsenal Arsenal Manchester Utd Manchester Utd 2 0 10.03. 08:05 Chelsea Chelsea Wolves Wolves 1 1 10.03. 06:00 Liverpool Liverpool Burnley Burnley 4 2 09.03. 11:30 Manchester City Manchester City Watford Watford 3 1 09.03. 09:00 Cardiff Cardiff West Ham West Ham 2 0 09.03. 09:00 Huddersfield Huddersfield Bournemouth Bournemouth 0 2 09.03. 09:00 Leicester Leicester Fulham Fulham 3 1 09.03. 09:00 Newcastle Newcastle Everton Everton 3 2 09.03. 09:00 Southampton Southampton Tottenham Tottenham 2 1 09.03. 06:30 Crystal Palace Crystal Palace Brighton Brighton 1 2 ROUND 29 03.03. 10:15 Everton Everton Liverpool Liverpool 0 0 03.03. 08:05 Fulham Fulham Chelsea Chelsea 1 2 03.03. 06:00 Watford Watford Leicester Leicester 2 1 02.03. 11:30 West Ham West Ham Newcastle Newcastle 2 0 02.03. 09:00 Bournemouth Bournemouth Manchester City Manchester City 0 1 02.03. 09:00 Brighton Brighton Huddersfield Huddersfield 1 0 02.03. 09:00 Burnley Burnley Crystal Palace Crystal Palace 1 3 02.03. 09:00 Manchester Utd Manchester Utd Southampton Southampton 3 2 02.03. 09:00 Wolves Wolves Cardiff Cardiff 2 0 02.03. 06:30 Tottenham Tottenham Arsenal Arsenal 1 1 ROUND 28 27.02. 14:00 Chelsea Chelsea Tottenham Tottenham 2 0 27.02. 14:00 Crystal Palace Crystal Palace Manchester Utd Manchester Utd 1 3 27.02. 14:00 Liverpool Liverpool Watford Watford 5 0 27.02. 14:00 Manchester City Manchester City West Ham West Ham 1 0 27.02. 13:45 Arsenal Arsenal Bournemouth Bournemouth 5 1 27.02. 13:45 Southampton Southampton Fulham Fulham 2 0

In reply to Re: Web scrapping by harangzsolt33
in thread Web scrapping by joyfedl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.