From: Laurent <lutgenl@icloud.com>
Subject: Re: Some advice required [OT]
Date: Tue, 28 Dec 2021 01:05:49 -0800 (PST) [thread overview]
Message-ID: <875d209a-9504-4cdb-86cd-ce9b220a4a92n@googlegroups.com> (raw)
In-Reply-To: <7f50b560-9d28-4572-a90c-7488fb27582en@googlegroups.com>
On Tuesday, 28 December 2021 at 08:48:32 UTC+1, Laurent wrote:
> On Tuesday, 28 December 2021 at 01:29:57 UTC+1, Ben Bacarisse wrote:
> > Laurent <lut...@icloud.com> writes:
> >
> > > On Monday, 27 December 2021 at 21:49:18 UTC+1, Ben Bacarisse wrote:
> > >> Laurent <lut...@icloud.com> writes:
> > >>
> > >> > On Monday, 27 December 2021 at 14:14:42 UTC+1, Ben Bacarisse wrote:
> > >> >> Laurent <lut...@icloud.com> writes:
> > >> >>
> > >> >> > On Monday, 27 December 2021 at 12:16:27 UTC+1, Niklas Holsti wrote:
> > >> >> >
> > >> >> >> Sorry, but I found your problem description impossible to understand.
> > >> >> >> Try to describe more clearly the experiment that is done, the structure
> > >> >> >> of the data the experiment provides (the meaning of the Excel rows and
> > >> >> >> columns), and the statistic you want to compute.
> > >> >> >
> > >> >> > Sorry tried to keep it short, was too short.
> > >> >> >
> > >> >> > Columns are the antimicrobial drugs
> > >> >> > Rows are the microorganism.
> > >> >> >
> > >> >> > So every cell contains a result of S, I, R or simply an empty cell
> > >> >> >
> > >> >> > S = Sensible
> > >> >> > I = Intermediate
> > >> >> > R = Resistant
> > >> >> >
> > >> >> > empty cell <S<I<R
> > >> >> >
> > >> >> > If a patient has 3 strains of the same microorganism but with
> > >> >> > different resistance profiles I have to find the most resistant
> > >> >> > one. Or if they are different I keep them all.
> > >> >> >
> > >> >> > I have no idea how to explain what I am doing to the compiler.
> > >> >> I think when you can explain it to people, you'll be able to code it. I
> > >> >> am still struggling to understand what you need.
> > >> >> > Why I would choose result from strain B over the result from strain A.
> > >> >> >
> > >> >> > strain A: SSSRSS
> > >> >> > strain B: SSRRRS
> > >> >> Let's space it out
> > >> >>
> > >> >> drug 1 drug 2 drug 3 drug 4 drug 5 drug 6
> > >> >> strain A S S S R S S
> > >> >> strain B S S R R R S
> > >> >>
> > >> >> You want to choose B because it has is resistant to more drugs, yes?
> > >> >>
> > >> >
> > >> > Yes indeed
> > >> >
> > >> >> I think, from the ordering you give, you need a measure that treats an R
> > >> >> as "more important" that any "I" which is "more important" than an "S".
> > >> >> (We will come to empty cells later.)
> > >> >>
> > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> > >> >> number. In base 10, the strains score
> > >> >>
> > >> >> R S I
> > >> >> strain A 1 5 0 = 150
> > >> >> strain B 3 3 0 = 330
> > >> >>
> > >> >> Now, in fact, you don't need to use base 10. The smallest base you can
> > >> >> use is one more than the maximum number of test results. If there can
> > >> >> be up to 16 tests (say) the score is
> > >> >>
> > >> >> n(R)*17*17 + n(S)*17 + n(I).
> > >> >>
> > >> >> If this suits your needs, we can consider empty cells later on. It's
> > >> >> not at all clear to me how to compare
> > >> >>
> > >> >> strain C R____
> > >> >> strain D RRSSSS
> > >> >>
> > >> >> Strain C is "less resistant" but only because there is not enough
> > >> >> information. In fact it seems more serious as it is resistant to all
> > >> >> tested drugs.
> > >> >>
> > >> >
> > >> > Strain C is probably garbage and I would remove it. With a bit of luck I will have the result with the same sample Id which would be complete.
> > >> >
> > >> >> And then what about
> > >> >>
> > >> >> strain D SR
> > >> >> strain E RS
> > >> >>
> > >> >
> > >> > Yes those are the cases which are annoying me.
> > >> >
> > >> > That's why I came up withe idea of multiplying the value of the result
> > >> > (S=1, I=2 and R=3) with the position of the value. Tried it with
> > >> > triplets but there will still be cases where different results will
> > >> > give the same numeric value. Ignoring empty cell able tps for the moment.
> > >> >
> > >> > Strain F: SSR (1*1+2*1+3*3) =12 and Strain G: RRS (1*3+ 2*3+3*1) = 12
> > >> > will be the same numerical value but they are different resistance
> > >> > profiles I would in this case keep both.
> > >> >
> > >> > How to prevent that from happening.
> > >> Can you first say why the suggestion I made is not helpful?
> > >>
> > >> --
> > >> Ben.
> > >
> > > You mean that one:
> > >
> > >> >> I think you need to treat the number of Rs, Is and Ss like digits in a
> > >> >> number. In base 10, the strains score
> > >> >>
> > >> >> R S I
> > >> >> strain A 1 5 0 = 150
> > >> >> strain B 3 3 0 = 330
> > >> >>
> > >
> > > Different resistance profiles same result:
> > I don't yet understand the requirements so I am taking it in stages.
> > The first requirement seemed to be "more or less resistant". To do that
> > you can use digits in a large enough base but this will make the number
> > of Rs, Ss and Is paramount. Is that acceptable as a first step?
> >
> The requirements are one strain of a certain microorganism/patient
> The most resistant one or if they have different profiles
>
> SRS vs RRS => last one, more Rs
>
> SRS vs RSR = both, different profiles
> > In order to help people to be able to make further suggestions, maybe
> > you could give the relative ordering you would like to see between the
> > following sets of profiles. For example, between SSR, SRS and RSS, I
> > think the order you want is RSS > SRS > SSR.
> >
> > 1: SSR, SRS, RSS
> >
> > 2: RSI, RIS, SRI, SIR, IRS, ISR
> >
> > 3: SSSR, SSRS, SRSS, RSSS
> >
> > 4: RRSSS, RSSSR, RIIII, SRIII, RSIII, IIIRS, IIISR
> >
> The order of the results is given by the ID of the drug in the extraction tool.
> I could probably order them by family and hierarchy of potence but
> would that make a difference?
> > It's possible you could make do with an extra field (or digits) that
> > gives some measure of the relative ordering between otherwise similar
> > sequences. For example, using base 10 (for convenience of arithmetic)
> > both RRSSI and RSRSI would score 212xx but the last xx would reflect the
> > positioning of the results in the sequence. There are lots of way to do
> > this. One way would be use, as you were thinking, some sort of weighted
> > count. Using S=0, I=1 and R=2 with weights
> >
> > 54321
> > RRSSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+4) + 0*(3+2) + 1*1 = 21219
> > RSRSI scores 2*10000 + 1*1000 + 2*100 + 2*(5+3) + 0*(4+2) + 1*1 = 21217
> >
> So to be sure that I am following:
>
> 2*(5+4) = value of R (=2) * position of R(@5 and @4)
> 2*(5+3) = value of R (=2) * position of R(@5 and @3)
>
> 0*(3+2) = value of S (=0) * position of S(@3 and @2)
> 0*(4+2) = value of S (=0) * position of S(@4 and @2)
>
> 1*1 = value of I (=1) * position of I (@1)
>
> 2*10000 + 1*1000 + 2*100 Is just used as padding? So 212 could be any other
> number?
>
Eh forget the last sentence, brain fart: I have 2 R's so 2*10000, 1 I so 1*1000 and 2 S's so 2*100
> But in this example I would have to keep both as drug 5,2 and 1 are common
> to both results but 4 and 3 are unique.
>
> The score would be completely misleading.
>
> So if my table has a width of 20 columns the first column would be
> 10^20, the next 10^19,.... +/- a few 0s off?
>
> I would have to implement it and see what I get as result.
> > If you absolutely must never get duplicate numbers, but you still want
> > to preserve a strict specified ordering, I think you will have much more
> > work to do.
> >
> > Getting a unique number for each case it trivial (but the ordering will
> > be wrong) and getting an ordering that rates every R > every S > every I
> > is also trivial, but there will be lots of duplicates. It's finding the
> > balance that's going to be hard.
> >
> > --
> > Ben.
> I have prepared a cleaned up Excel workbook with only the duplicates which
> pose problems. The ones I would keep have an orange ID.
> I could upload it to Github. If that helps understanding the different cases.
>
> Thanks for your patience
>
> Laurent
next prev parent reply other threads:[~2021-12-28 9:05 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-27 9:21 Some advice required [OT] Laurent
2021-12-27 11:16 ` Niklas Holsti
2021-12-27 12:29 ` Laurent
2021-12-27 13:14 ` Ben Bacarisse
2021-12-27 18:24 ` Laurent
2021-12-27 19:51 ` Dennis Lee Bieber
2021-12-27 20:49 ` Ben Bacarisse
2021-12-27 22:09 ` Laurent
2021-12-28 0:29 ` Ben Bacarisse
2021-12-28 7:48 ` Laurent
2021-12-28 9:05 ` Laurent [this message]
2021-12-28 12:54 ` Laurent
2021-12-28 13:57 ` Ben Bacarisse
2021-12-28 18:19 ` Laurent
2021-12-28 13:43 ` Ben Bacarisse
2021-12-28 16:49 ` Dennis Lee Bieber
2021-12-29 4:20 ` Randy Brukardt
2021-12-27 17:41 ` Dennis Lee Bieber
2021-12-27 18:56 ` Niklas Holsti
2021-12-27 19:44 ` Laurent
2021-12-28 2:10 ` Randy Brukardt
2021-12-28 6:02 ` Laurent
2021-12-29 3:58 ` Randy Brukardt
2021-12-27 17:18 ` Simon Wright
2021-12-27 18:30 ` Laurent
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox