Comparing graph shapes or soft matching.

Replies are listed 'Best First'.
Re: Comparing graph shapes or soft matching. by hippo (Archbishop) on Jun 05, 2024 at 12:55 UTC
a discernable difference If you calculate the mean and standard deviation you can then flag any values greater than n standard deviations away from the mean. It's up to you to choose your value of n for your own level of tolerance. 🦛	[reply]
Re: Comparing graph shapes or soft matching. by etj (Priest) on Jun 05, 2024 at 17:03 UTC
Based on the far-from-specific-enough description, I am interpreting this as suitable for PDL's `approx` function, which would broadcast over each element of two inputs, and set in the output either 1 (close enough) or 0 (not that). E.g.: `use PDL; $close_enough_all = approx(map PDL->topdl($_), $in1, $in2)->all;` [download] Otherwise, "graph shapes" might be anything: statistical similarity, gradient similarity, ...	[reply] [d/l] [select]
Re: Comparing graph shapes or soft matching. by bliako (Abbot) on Jun 06, 2024 at 08:49 UTC
What you are probably asking is finding outliers in your data. This may be of help: How to best eliminate values in a list that are outliers and Grubb's test is offered in Statistics::Descriptive. I have to admit you did a good job at obfuscating your question. bw, bliako	[reply]
Re: Comparing graph shapes or soft matching. by erix (Prior) on Jun 05, 2024 at 12:32 UTC
Probably not useful for your problem now, but some databases can do RPR (Row Pattern Recognition) which seems to be able to do such things. (it's in the pipeline for postgres. You use Netcool -> IBM -> DB2? Alas, I don't see it in DB2 either.)	[reply]
Re: Comparing graph shapes or soft matching. by LanX (Saint) on Jun 05, 2024 at 13:34 UTC
If "graphed" is supposed to mean a "plotted" function graph, then try the Mean squared error or a similar averaged distance metric reflecting your needs. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]
Re: Comparing graph shapes or soft matching. by Danny (Chaplain) on Jun 05, 2024 at 13:50 UTC
I'm picturing that you have two arrays of integers. Can you just count the number of differences? For example: `foreach $value (@listA) { $countsA{$value}++; $seen{$value} = 1; } foreach $value (@listB) { $countsB{$value}++; $seen{$value} = 1; } $diff = 0; foreach $value (keys %seen) { if( not defined $listA{$value} ) { $listA{$value} = 0; } elsif( not defined $listB{$value} ) { $listB{$value} = 0; } $diff += abs($countsA{$value} - $countsB{$value}) }` [download] EDIT: If you are expecting all lists to be similar to some global target, you could calculate the count means of each value over all sets. Then instead of comparing two sets (as above), you could compare each set to the means.	[reply] [d/l]
Re: Comparing graph shapes or soft matching. by choroba (Cardinal) on Jun 05, 2024 at 12:27 UTC
What do you mean by "line up"? Can you give an example of normal behaviour versus the behaviour that should be reported? `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re: Comparing graph shapes or soft matching. by LanX (Saint) on Jun 05, 2024 at 12:59 UTC
> but if graphed, the lines should line up or nearly line up. Very fuzzy problem description. > I am not a maths expert I am in the (worst) case. The subgraph isomorphism problem is NP complete, the complexity of the graph isomorphism problem unsolved. This doesn't mean that you won't find a practical algorithm in over 90 percent of the cases. But once you hit a hard one it will take ages to complete. Of course this could (likely) be an easy XY problem, but without SSCCE how can we possibly tell. 🤷🏻‍♂️ Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} ) graph vs chart, apple vs oranges, tomato vs potato, 🥔 vs 🍅, ...	[reply]
Re: Comparing graph shapes or soft matching. by tweetiepooh (Hermit) on Jun 28, 2024 at 08:35 UTC
Sorry for being too vague, I guess it is one of those situations where the question is obvious to me even from the initial statement that I can't believe no-one else sees it as such. What I have is a row count from a number of databases plotted against time. Rows are added to one database and are copied to the others. In a normal situation the graphs will line up or nearly line up, the row count in each database will be the same or nearly the same but because copy and sample times can vary there can be delays to the rows counts on one or more database. (Additionally it is possible for lots of rows to be added and then removed from the main database before the copy mechanism kicks in so those rows may not make it to one or more copy at all, this is not a problem). An error situation needs reporting where the counts get out of sync and remain out of sync shown on the plots by the lines not lining up at all. The error can "clear" if and when the plots line up again (or nearly so). Thinking things over, I would need to try to line up the time values and then compare the row counts, and report where the row counts differ more than a certain number of times in succession by a sufficient value to worry about.	[reply]
Re^2: Comparing graph shapes or soft matching. by LanX (Saint) on Jun 28, 2024 at 08:49 UTC
You obviously have a clear picture in mind, which is hard to communicate verbally. The easiest way to ask this question is to show us some sample plots, the associated data and your interpretation of "alikeness". Either by ASCII graphic in a code section or by sharing a link to a freely hosted picture ° `+ x * x + etc x +` [download] I already gave you the standard approach in mathematics, which is calculate the mean of squared differences. You might tell us why this doesn't solve the problem. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} Update °) Another possibility is sharing a Google doc spreadsheet including data and plotting	[reply] [d/l]
Re^3: Comparing graph shapes or soft matching. by cavac (Prior) on Jun 28, 2024 at 16:36 UTC
If the data can be exported to something like CSV, it can be plotted using gnuplot into a nice ASCII art. Data-example (incomplete): `date_trunc;temperature;humidity;oventemperature 2024-06-28 16:25:31;22.5;99.9;27.2 2024-06-28 16:25:43;22.59;99.9;27.2 2024-06-28 16:25:53;22.59;99.9;27.2 2024-06-28 16:26:04;22.59;99.9;27.2 2024-06-28 16:26:15;22.59;99.9;27.2 2024-06-28 16:26:18;22.59;99.9;27.2 2024-06-28 16:26:28;22.59;99.9;27.2 2024-06-28 16:26:39;22.59;99.9;26.5 2024-06-28 16:26:49;22.59;99.9;27 ...` [download] Gnuplot-file for this: `set title "Bedroom\nTemperature BME" set datafile separator ";" # Set virtual terminal size to 120x30 and print to STDOUT set terminal dumb size 120,30 set xlabel "Logtime" set ylabel "Sensorvalue" set xdata time #set timefmt "%Y-%m-%d" #set xrange ["03/21/95":"03/22/95"] set format x "%d.%m\n%H:%M" set timefmt "%Y-%m-%d %H:%M:S" set key autotitle columnhead outside plot "bme.csv" using 1:2 title "Room temp(Celsius)" with lines, \ "dht.csv" using 1:4 title "Oven temp (Celsius)" with lines` [download] "bme.csv" using 1:2 specifies the filename and which two columns to use for a given plotline (first one is always the timestamp in my case). You can use the same file with different columns, or use different files altogether. Result: Bedroom + Temperature BME + 28 +------------------------------------------------------------- +-------------------+ \| + + + + + + + + # + \| Room temp(Celsius) ***** \| # # ## #### ## # # # # + # # # # # # \| Oven temp (Celsius) ####### 27 \|-+ ######################################################## +################ +-\| \| ############## ##################### # ## ########### # +#### ########### \| \| ##### ######## ### ################# # ## ########### # +#### ############ \| \| ## # # # # + # #### ## ### \| 26 \|-+ + # ## ###+-\| \| + # # ### \| \| + # ## \| 25 \|-+ + ##+-\| \| + ## \| \| + # \| \| + # \| 24 \|-+ + #+-\| \| + # \| \| + # \| 23 \|-+ + #+-\| \| + **** \| \| **************************************************** +******** \| \| + + + + + + + + + \| 22 +------------------------------------------------------------- +-------------------+ 28.06 28.06 28.06 28.06 28.06 28.06 28.06 2 +8.06 28.06 28.06 16:15 16:30 16:45 17:00 17:15 17:30 17:45 1 +8:00 18:15 18:30 Logtime [download] PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP Also check out my sisters artwork and my rather simple sketches/one-panel comics	[reply] [d/l] [select]
Re^3: Comparing graph shapes or soft matching.(Sharing spreadsheets/ line plots) by LanX (Saint) on Jun 29, 2024 at 10:41 UTC
> Another possibility is sharing a Google doc spreadsheet including data and plotting See How to Make a Line Graph in Google Sheets and insert it in a Google Doc (YouTube) Documents in Google drive can be read publicly by setting rights and sharing the link. Like this you can attach several MBs of data plus the line charts you want to highlight. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]

a discernable difference

If you calculate the mean and standard deviation you can then flag any values greater than n standard deviations away from the mean. It's up to you to choose your value of n for your own level of tolerance.

🦛

[reply]

approx

use PDL;
$close_enough_all = approx(map PDL->topdl($_), $in1, $in2)->all;
[download]

[reply]
[d/l]
[select]

What you are probably asking is finding outliers in your data. This may be of help: How to best eliminate values in a list that are outliers and Grubb's test is offered in Statistics::Descriptive. I have to admit you did a good job at obfuscating your question.

bw, bliako

[reply]

Probably not useful for your problem now, but some databases can do RPR (Row Pattern Recognition) which seems to be able to do such things. (it's in the pipeline for postgres. You use Netcool -> IBM -> DB2? Alas, I don't see it in DB2 either.)

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

[reply]

foreach $value (@listA) {
 $countsA{$value}++;
 $seen{$value} = 1;
}

foreach $value (@listB) {
 $countsB{$value}++;
 $seen{$value} = 1;
}

$diff = 0;
foreach $value (keys %seen) {
  if( not defined $listA{$value} ) {
    $listA{$value} = 0;
  } elsif( not defined $listB{$value} ) {
    $listB{$value} = 0;
  }

  $diff += abs($countsA{$value} - $countsB{$value})
}
[download]

[reply]
[d/l]

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

[reply]
[d/l]

> but if graphed, the lines should line up or nearly line up.

Very fuzzy problem description.

> I am not a maths expert

I am in the (worst) case.

The subgraph isomorphism problem is NP complete, the complexity of the graph isomorphism problem unsolved.

This doesn't mean that you won't find a practical algorithm in over 90 percent of the cases. But once you hit a hard one it will take ages to complete.

Of course this could (likely) be an easy XY problem*, but without SSCCE how can we possibly tell. 🤷🏻‍♂️

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

*) graph vs chart, apple vs oranges, tomato vs potato, 🥔 vs 🍅, ...

[reply]

Sorry for being too vague, I guess it is one of those situations where the question is obvious to me even from the initial statement that I can't believe no-one else sees it as such.

What I have is a row count from a number of databases plotted against time. Rows are added to one database and are copied to the others. In a normal situation the graphs will line up or nearly line up, the row count in each database will be the same or nearly the same but because copy and sample times can vary there can be delays to the rows counts on one or more database. (Additionally it is possible for lots of rows to be added and then removed from the main database before the copy mechanism kicks in so those rows may not make it to one or more copy at all, this is not a problem).

An error situation needs reporting where the counts get out of sync and remain out of sync shown on the plots by the lines not lining up at all. The error can "clear" if and when the plots line up again (or nearly so).

Thinking things over, I would need to try to line up the time values and then compare the row counts, and report where the row counts differ more than a certain number of times in succession by a sufficient value to worry about.

[reply]

The easiest way to ask this question is to show us some sample plots, the associated data and your interpretation of "alikeness".

Either by ASCII graphic in a code section or by sharing a link to a freely hosted picture °


+     x
  * x +  etc
x   +
[download]

I already gave you the standard approach in mathematics, which is calculate the mean of squared differences.

You might tell us why this doesn't solve the problem.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

Update

°) Another possibility is sharing a Google doc spreadsheet including data and plotting

[reply]
[d/l]

If the data can be exported to something like CSV, it can be plotted using gnuplot into a nice ASCII art.

Data-example (incomplete):

date_trunc;temperature;humidity;oventemperature
2024-06-28 16:25:31;22.5;99.9;27.2
2024-06-28 16:25:43;22.59;99.9;27.2
2024-06-28 16:25:53;22.59;99.9;27.2
2024-06-28 16:26:04;22.59;99.9;27.2
2024-06-28 16:26:15;22.59;99.9;27.2
2024-06-28 16:26:18;22.59;99.9;27.2
2024-06-28 16:26:28;22.59;99.9;27.2
2024-06-28 16:26:39;22.59;99.9;26.5
2024-06-28 16:26:49;22.59;99.9;27
...
[download]

Gnuplot-file for this:

set title "Bedroom\nTemperature BME"
set datafile separator ";"

# Set virtual terminal size to 120x30 and print to STDOUT
set terminal dumb size 120,30

set xlabel "Logtime"
set ylabel "Sensorvalue"

set xdata time
#set timefmt "%Y-%m-%d"
#set xrange ["03/21/95":"03/22/95"]
set format x "%d.%m\n%H:%M"
set timefmt "%Y-%m-%d %H:%M:S"

set key autotitle columnhead outside

plot "bme.csv" using 1:2 title "Room temp(Celsius)" with lines, \
     "dht.csv" using 1:4 title "Oven temp (Celsius)" with lines
[download]

"bme.csv" using 1:2 specifies the filename and which two columns to use for a given plotline (first one is always the timestamp in my case). You can use the same file with different columns, or use different files altogether.

Result:

                                             Bedroom                  
+                                                  
                                         Temperature BME              
+                                                  
     28 +-------------------------------------------------------------
+-------------------+                              
        |        +        +        +        +        +        +       
+ +  #     +        |  Room temp(Celsius) *******  
        |          # #           ## ####      ##   #      #      # #  
+    # #  # # # #   | Oven temp (Celsius) #######  
     27 |-+   ########################################################
+################ +-|                              
        |      ############## ##################### # ## ########### #
+#### ###########   |                              
        |      ##### ######## ### ################# # ## ########### #
+#### ############  |                              
        |              ##           #                       #  #  #   
+  #  ####  ## ###  |                              
     26 |-+                                                           
+       #   ## ###+-|                              
        |                                                             
+       #    # ###  |                              
        |                                                             
+       #       ##  |                              
     25 |-+                                                           
+               ##+-|                              
        |                                                             
+               ##  |                              
        |                                                             
+                #  |                              
        |                                                             
+                #  |                              
     24 |-+                                                           
+                #+-|                              
        |                                                             
+                #  |                              
        |                                                             
+                #  |                              
     23 |-+                                                           
+                #+-|                              
        |                                                             
+         ********  |                              
        |     ********************************************************
+**********         |                              
        |        +        +        +        +        +        +       
+ +        +        |                              
     22 +-------------------------------------------------------------
+-------------------+                              
      28.06    28.06    28.06    28.06    28.06    28.06    28.06    2
+8.06    28.06    28.06                            
      16:15    16:30    16:45    17:00    17:15    17:30    17:45    1
+8:00    18:15    18:30                            
                                             Logtime
[download]

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Also check out my sisters artwork and my rather simple sketches/one-panel comics

[reply]
[d/l]
[select]

> Another possibility is sharing a Google doc spreadsheet including data and plotting

See How to Make a Line Graph in Google Sheets and insert it in a Google Doc (YouTube)

Documents in Google drive can be read publicly by setting rights and sharing the link.

Like this you can attach several MBs of data plus the line charts you want to highlight.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

[reply]