Re: Comparing graph shapes or soft matching.
by hippo (Archbishop) on Jun 05, 2024 at 12:55 UTC
|
a discernable difference
If you calculate the mean and standard deviation you can then flag any values greater than n standard deviations away from the mean. It's up to you to choose your value of n for your own level of tolerance.
| [reply] |
Re: Comparing graph shapes or soft matching.
by etj (Priest) on Jun 05, 2024 at 17:03 UTC
|
Based on the far-from-specific-enough description, I am interpreting this as suitable for PDL's approx function, which would broadcast over each element of two inputs, and set in the output either 1 (close enough) or 0 (not that). E.g.:
use PDL;
$close_enough_all = approx(map PDL->topdl($_), $in1, $in2)->all;
Otherwise, "graph shapes" might be anything: statistical similarity, gradient similarity, ... | [reply] [d/l] [select] |
Re: Comparing graph shapes or soft matching.
by bliako (Abbot) on Jun 06, 2024 at 08:49 UTC
|
| [reply] |
Re: Comparing graph shapes or soft matching.
by erix (Prior) on Jun 05, 2024 at 12:32 UTC
|
Probably not useful for your problem now, but some databases can do RPR (Row Pattern Recognition) which seems to be able to do such things. (it's in the pipeline for postgres. You use Netcool -> IBM -> DB2? Alas, I don't see it in DB2 either.)
| [reply] |
Re: Comparing graph shapes or soft matching.
by LanX (Saint) on Jun 05, 2024 at 13:34 UTC
|
| [reply] |
Re: Comparing graph shapes or soft matching.
by Danny (Chaplain) on Jun 05, 2024 at 13:50 UTC
|
I'm picturing that you have two arrays of integers. Can you just count the number of differences? For example:
foreach $value (@listA) {
$countsA{$value}++;
$seen{$value} = 1;
}
foreach $value (@listB) {
$countsB{$value}++;
$seen{$value} = 1;
}
$diff = 0;
foreach $value (keys %seen) {
if( not defined $listA{$value} ) {
$listA{$value} = 0;
} elsif( not defined $listB{$value} ) {
$listB{$value} = 0;
}
$diff += abs($countsA{$value} - $countsB{$value})
}
EDIT: If you are expecting all lists to be similar to some global target, you could calculate the count means of each value over all sets. Then instead of comparing two sets (as above), you could compare each set to the means. | [reply] [d/l] |
Re: Comparing graph shapes or soft matching.
by choroba (Cardinal) on Jun 05, 2024 at 12:27 UTC
|
What do you mean by "line up"? Can you give an example of normal behaviour versus the behaviour that should be reported?
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
Re: Comparing graph shapes or soft matching.
by LanX (Saint) on Jun 05, 2024 at 12:59 UTC
|
> but if graphed, the lines should line up or nearly line up.
Very fuzzy problem description.
> I am not a maths expert
I am in the (worst) case.
The subgraph isomorphism problem is NP complete, the complexity of the graph isomorphism problem unsolved.
This doesn't mean that you won't find a practical algorithm in over 90 percent of the cases. But once you hit a hard one it will take ages to complete.
Of course this could (likely) be an easy XY problem*, but without SSCCE how can we possibly tell. 🤷🏻♂️
*) graph vs chart, apple vs oranges, tomato vs potato, 🥔 vs 🍅, ... | [reply] |
Re: Comparing graph shapes or soft matching.
by tweetiepooh (Hermit) on Jun 28, 2024 at 08:35 UTC
|
Sorry for being too vague, I guess it is one of those situations where the question is obvious to me even from the initial statement that I can't believe no-one else sees it as such.
What I have is a row count from a number of databases plotted against time. Rows are added to one database and are copied to the others. In a normal situation the graphs will line up or nearly line up, the row count in each database will be the same or nearly the same but because copy and sample times can vary there can be delays to the rows counts on one or more database. (Additionally it is possible for lots of rows to be added and then removed from the main database before the copy mechanism kicks in so those rows may not make it to one or more copy at all, this is not a problem).
An error situation needs reporting where the counts get out of sync and remain out of sync shown on the plots by the lines not lining up at all. The error can "clear" if and when the plots line up again (or nearly so).
Thinking things over, I would need to try to line up the time values and then compare the row counts, and report where the row counts differ more than a certain number of times in succession by a sufficient value to worry about.
| [reply] |
|
You obviously have a clear picture in mind, which is hard to communicate verbally.
The easiest way to ask this question is to show us some sample plots, the associated data and your interpretation of "alikeness".
Either by ASCII graphic in a code section or by sharing a link to a freely hosted picture °
+ x
* x + etc
x +
I already gave you the standard approach in mathematics, which is calculate the mean of squared differences.
You might tell us why this doesn't solve the problem.
Update
°) Another possibility is sharing a Google doc spreadsheet including data and plotting | [reply] [d/l] |
|
If the data can be exported to something like CSV, it can be plotted using gnuplot into a nice ASCII art.
Data-example (incomplete):
date_trunc;temperature;humidity;oventemperature
2024-06-28 16:25:31;22.5;99.9;27.2
2024-06-28 16:25:43;22.59;99.9;27.2
2024-06-28 16:25:53;22.59;99.9;27.2
2024-06-28 16:26:04;22.59;99.9;27.2
2024-06-28 16:26:15;22.59;99.9;27.2
2024-06-28 16:26:18;22.59;99.9;27.2
2024-06-28 16:26:28;22.59;99.9;27.2
2024-06-28 16:26:39;22.59;99.9;26.5
2024-06-28 16:26:49;22.59;99.9;27
...
Gnuplot-file for this:
set title "Bedroom\nTemperature BME"
set datafile separator ";"
# Set virtual terminal size to 120x30 and print to STDOUT
set terminal dumb size 120,30
set xlabel "Logtime"
set ylabel "Sensorvalue"
set xdata time
#set timefmt "%Y-%m-%d"
#set xrange ["03/21/95":"03/22/95"]
set format x "%d.%m\n%H:%M"
set timefmt "%Y-%m-%d %H:%M:S"
set key autotitle columnhead outside
plot "bme.csv" using 1:2 title "Room temp(Celsius)" with lines, \
"dht.csv" using 1:4 title "Oven temp (Celsius)" with lines
"bme.csv" using 1:2 specifies the filename and which two columns to use for a given plotline (first one is always the timestamp in my case). You can use the same file with different columns, or use different files altogether.
Result:
Bedroom
+
Temperature BME
+
28 +-------------------------------------------------------------
+-------------------+
| + + + + + +
+ + # + | Room temp(Celsius) *******
| # # ## #### ## # # # #
+ # # # # # # | Oven temp (Celsius) #######
27 |-+ ########################################################
+################ +-|
| ############## ##################### # ## ########### #
+#### ########### |
| ##### ######## ### ################# # ## ########### #
+#### ############ |
| ## # # # #
+ # #### ## ### |
26 |-+
+ # ## ###+-|
|
+ # # ### |
|
+ # ## |
25 |-+
+ ##+-|
|
+ ## |
|
+ # |
|
+ # |
24 |-+
+ #+-|
|
+ # |
|
+ # |
23 |-+
+ #+-|
|
+ ******** |
| ********************************************************
+********** |
| + + + + + +
+ + + |
22 +-------------------------------------------------------------
+-------------------+
28.06 28.06 28.06 28.06 28.06 28.06 28.06 2
+8.06 28.06 28.06
16:15 16:30 16:45 17:00 17:15 17:30 17:45 1
+8:00 18:15 18:30
Logtime
| [reply] [d/l] [select] |
|
| [reply] |