any statisticians on here?

Soldato
Joined
18 Oct 2012
Posts
8,332
evening all.

my brain is failing me and i'd figured i'd consult the ocuk hive mind see if anyone has any ideas/suggestions for a little problem i'm having.

so essentially, i have 4 tests. in each test i have 5 samples which are plotted as an X-y scatter (which produces a line for each sample).

for each batch if you plot all 5 samples on a standard scatter chart you get a bunch of lines with a level of variation between them (one sample might be a bit weaker, bit stronger etc, noise basically).

needless to say, plotting all 40 lines on a single chart results in something that's just far too confusing to read.

what i want to do is figure out some method of representing this in a way where i can have something simple and visually distinctive by representing a batch of 5 samples as an average, with a "cloud" around it based on the deviation between all 5 samples.

here's an example of what i'm thinking with obligatory top tier paint skills, black is the "average" line with red being the error cloud:
AGCdyh7.png

problem is the variation in the samples isn't just as simple as the Y values varying, but the X values too, so say row 500 on the spreadsheet sample 1 might have an X&Y of 10&15, but sample 2 might be 20&30, sample 3 might be 10&30 etc so i'm struggling to think of how i'm gonna do it numerically.

edit- forgot to mention, the X values for each sample are one-way, as in the next row down will never be smaller than the ones before it, although it might not move for a while then jump.
 
Last edited:
Soldato
OP
Joined
18 Oct 2012
Posts
8,332
what does the X and Y axis represent?

the X axis is a sort of percentage completion, the tests are actually measuring a progression in mm but as the start points and lengths don't match up perfectly figured it'd be easier to represent as how far along it's progressed (so all the charts start and end at the same place)

I think the whale ate Nemo....
Had a quick go, hopefully this works for you?
Best I can do.


well, thread delivers already :p

Got the raw data?

Considered using interactive plots to make filtering the data easier to drill into without loosing the raw values?

"Error envelope" is likely what you're looking for. R and ggplot are pretty common tools so found this for ya.

https://www.r-graph-gallery.com/104-plot-lines-with-error-envelopes-ggplot2.html

error envelope does sound like the kind of thing i'm after, kinda like your bog standard error bars is it possible/practical/sane to account for both axes at the same time?

i've been working in excel up until now, as that's what i'm most familiar with, but this won't be the first time i've had to question wether or not i'm using the right tool for the job.
 
Soldato
Joined
9 Mar 2010
Posts
2,838
If ya pastebin the raw data and I'll have a quick play for ya.

Not sure excel is up to the task but python, R, matlab (or maybe just the Plotly interactive graphs they offer on thier website might do it) would be my suggestions.

With regards to "what's correct" it's really just a matter of what gets your point across. Nothing stopping you representing each datapoint as a picture of a cat and showing the error as pickles.

Edit: as an aside, if you're doing anything of this for the purpose of a masters, phd or research the more reproducible you can make your analysis the better. Actually, regardless of what you're doing it for by programming your analysis you make it repeatable for any further experiments you run as well and less prone to error.
 
Last edited:
Soldato
OP
Joined
18 Oct 2012
Posts
8,332
If ya pastebin the raw data and I'll have a quick play for ya.

Not sure excel is up to the task but python, R, matlab (or maybe just the Plotly interactive graphs they offer on thier website might do it) would be my suggestions.

With regards to "what's correct" it's really just a matter of what gets your point across. Nothing stopping you representing each datapoint as a picture of a cat and showing the error as pickles.

Edit: as an aside, if you're doing anything of this for the purpose of a masters, phd or research the more reproducible you can make your analysis the better. Actually, regardless of what you're doing it for by programming your analysis you make it repeatable for any further experiments you run as well and less prone to error.

spent most of today playing around and managed to get to this point after abandoning trying to do both axes simaltaneously:
ZAS6cGv.png

moved away from the percentages and back to the raw data, but took the starting values off so at least everything's starting from zero.

made a new x axis that moved at a repeatable level and then matched the corresponding y values for each test, meant i could average them and then do standard deviation to get the upper and lower bounds.

not 100% sure that a direct average/standard error is the best showcase but it's going the right direction.

of course overlaying this to get all 4 batches on one chart is going to be fun with excel's janky formatting.
 
Back
Top Bottom