Data information Sample information Previously amplified PCR products from a patient sample and the control pNL4-3, HIV Clone plasmid (Adachi et al.,1986) for the HIV work were kindly supplied by irsiCaixa AIDS Research Institute (Dr M. Noguera-Julian). Sample Amplification HIV PCR mastermix used 1x colourless GoTaq® buffer, 100nM of each primer, 200µM dNTP mix, 1.5mM magnesium chloride and 0.02 units/µl GoTaq® DNA polymerase. DNA was added at 1ng/µl. The therocycler program consisted of 95°C for 2 minutes followed by 40 cycles of 95°C for 30 seconds, 50°C for 30 seconds and then 72°C for 30 seconds. The samples were then held at 10°C. Chip 1 primer pair: forward primer: 5’ phosphate-GAGCTTCAGGTTTGGGGA and reverse primer 5’ Cy5-CTGAGTCAACAGATTTCTTCC, chip 2 primer pair: forward primer 5’ phosphate-TGAGTTTGCCAGGAAGATGG and reverse primer 5’ Cy5-ATTGTATGGATTTTCAGGCCC, chip 3 primer pair: forward primer 5’phosphate-GTTAAACAATGGCCATTGACAG and reverse primer 5’ Cy5- CTACTTTGGAATATTGCTGGTG and chip 4 primer pair: forward primer 5’ Phosphate- GATATCAGTACAATGTGCTTCC and reverse primer 5’Cy5-CCAGTTCTAGCTCTGCTTC  Target preparation: Quantification of amplified targets  DNA was quantified in one of two ways. For Cy5 labelled products DNA was directly quantified using the Nanodrop 2000 using the microarray option and the Cy5 setting according to the manufacturer’s instructions. A continuous fluorescence reading was taken over the temperature range of 20 to 30°C. The readings were then averaged for each sample and those for the dilution series of the labelled primer were used to construct a calibration curve of fluorescence vs concentration. This curve was then used to determine the concentration of the samples.    PCR purification   All PCR products were purified using the QIAquick PCR purification kit (QIAGEN). The manufacturer’s instructions were followed but with the following modifications. Eight 50µl PCR reactions were pooled and then purified as one sample. The mixture of the binding buffer, pH indicator and PCR sample was neutralised with 30-50µl of 3M sodium acetate solution, using the colour of pH indicator as a guide (as described in the manufacturer’s protocol) and the purified PCR products were eluted in 30µl of EB buffer that was passed through the binding column twice to maximise recovery.     Digestion of PCR products to single stranded target   PCR products were digested using lambda exonuclease to remove the strand labelled with a phosphate group at the 5’ end (added during PCR using a phosphorylated primer). The digestion reaction consisted of 1x digestion buffer and 5 units of lambda exonuclease. PCR product was added to ensure that the subsequent hybridisation reaction would have a PCR product concentration of between 20-75nM. The exception was the Selector generated targets where the concentration was adjusted to ensure that each target would have a final concentration of 12.5nM in the subsequent hybridisation mixture.     Array information : The re-sequencing and genotyping probes were designed against sequence data supplied by the irsiCaixa AIDS Research Institute, which was in-line with that of the European Nucleotide Achieve sequence K03455.1 (except for variant positions).  The re-sequencing probes were produced as described in the array design section. The genotyping probes were aimed at disecret regions of variation, where one probe was constructed to be complementary to each of the known sequence variants for the regions covered by each 19bp probes. Due to the number of probes required to cover the protease and reverse transcriptase genes, the probes were split across four arrays, such that the re-sequencing style and genotyping probes for the same regions were kept on the same arrays. In this regard chip 1 re-sequenced bases 2241-2438 of the reference sequence, which corresponded to codons 1-62 of the protease gene, chip 2 re-sequenced bases 2394-2687, which corresponded to codons 48-99 of the protease gene and codons 1-46 of the reverse transcriptase gene, chip 3 re-sequenced bases 2641-2974 which corresponds to codons 31-153 of the reverse transcriptase gene and finally chip 4 which re-sequenced bases 3010-3410 corresponding to codons 154-287 of the reverse transcriptase gene. Hence a total 8933 unique genotype probes and 4892 separate sequencing probes were printed in the following number of replicates in order to comply with the maximum number of probes possible within the required printing area of the microarray slide: chip 1 probes were printed in triplicate, chip 2 probes were printed in quadruplicate, chip 3 probes were printed in triplicate and chip 4 were printed in sextuplicate.   Assay Information: The pre-run slide blocking procedure was that described by Taylor et al (Taylor 2003). The hybridization solution consisted of 5x SSC, 0.01% tween 20 and 1x Sybr Green (Life Technologies). The hybridization solution was mixed with the target oligo to give a final concentration of 20nM. Samples were run on the HybLive using the following program: standard system priming step (according to manaufacurer’s default (Marcy et al. 2008), addition of sample either by injection port of directly into the hybridization chamber containing the microarray slide, sample denaturation at 85°C, a dynamic hybridization step (consisting of cooling the system from 85 to 20°C at the standard rate 8C/min), 2 minute post hybridization wash (HEN buffer comprising of 0.1M HEPES, 10mM sodium free EDTA adjusted to ph 8.0 with sodium hydroxide and sodium chloride added to give a final sodium concentration of 250mM accounting for sodium hydroxide added), melt-curve data acquisition (consisting of a temperature ramp form 20°C to 85°C at 1°C per minute with image acquisition every 18 seconds), post melt wash in 0.1x SSC (at 85°C) and final system cooling to 20°C. Active mixing was used in all steps following the image set up to ensure that the sample and washes were evenly distributed over the entire chip. Analysis information: Initial analysis was performed using the manufacturer’s supplied software (Hyblyzer, Genewave now Mobidiag), which performed the initial data extraction the fluorescent intensity from each feature on the array over the series of TIFF images (included in this file). The series of intestines from each spot were then constructed into a melt-curve for each array feature by plotting the intensity readings against the temperature of the array over the course of the melt-curve analysis stage of the experiment. This raw plot was then subject to smoothing using the standard smoothing setting in the Hyblizer software and the plots were then converted to a negative first derivative or fluorescence vs temperature again performed using the standard processing in the HybLyzer software supplied with the HybLive instrument. The raw data includes the full dataset collected over the entire melt-curve, however the analysed data only includes the data between collected between 35-70℃ as this was the part of the melt-curve used for calling sequencing and genotyping results, as well as performing quantitative analysis. The loss of non specifically bound products below 35℃ lead to noisy, uninterpretable and un-useful data. Re-Sequencing and Genotyping calls were made based on first derivative plots, where the “correct call” was determined to be the probe that gave the first derivative peak at the highest temperature between 35-70℃. Where more than one peak was present in the temperature range the highest temperature peak was taken in to consideration when making a re-sequencing or genotyping call.   Where quantitative analysis was required once the above initial data extraction described above had been performed the data for the resultant smoothed, first derivative melt-curves were exported in CSV format. The data were then imported into Excel (Microsoft), as averaged values based taken across all replicates of a given probe (i.e. the replica spot data for each probe, was averaged and only data for one plot per probe was exported for further analysis). Due to some plots showing unexpected sharp drops in fluorescence in a few runs below 35°C, the part of the plots being used to perform the quantitative analysis being the part of the melt-curve plot above 35°C data collected below 35°C were excluded from the analysis. To account for differences in hybridisation efficiency across the chip and between runs, the first derivative plots for each probe were then normalised to the maximum peak height seen in each plot such that this became 100 and all other values were then scaled to this. Peak heights as well as the estimates of the percentage of the minor allele present were determined as described in section 2.12 of the manuscript. When performing quantitative analysis by comparing peak height (by simply calculating the height of the smaller peak in question as a percentage of the larger) all resultant percentages calculated were divided by 2. This was to account for the fact that in the case of a true heterozygous mixture (i.e. a 50:50 mix of the two alleles) there would be no “minor” peak and as such both peaks would be 100% when the actual true allele frequency was only 50%.   Documents 2005110114_nucVariants This excel document summaries the raw NGS data summary supplied by the Spanish team for the patient sample (ID 2005110114). The Gene column lists the gene targeted in the sequencing. The mutation description column position in the target gene sequence followed by the reference allele and then the alternative allele seen. The call type details whether the calls is considered a putative call or has confirmed (accepted). The position and base substitution column details the summary of the of the mutation type. In the case of a single substituted base this is constructed a s for substitution, then the base position in the gene followed by the alternative based seen. For those positions where one or more base substitutions cause an amino acid change, these are listed below the single base substitutions and give the full triplet using the same shorthand but adding a m to denote those bases that match the reference sequence. The patient ID column lists the patient ID for the data recorded. The counts in forward read, Total number of forward reads, count in reverse reads and the Total number of reverse reads columns are self explanatory. HIV_SpainReference Condensed HIV results summary_Sept2016_v4_with temperature vs GC plot, gives a full comparison of all the resultant calls for the re-sequencing and genotyping probes taken across all 4 chips used to analyse the cloned HIV sample. Columns A through R summarise various results for the patient sample and are as follow. Columns A and B give details of the gene and base position within it is being analysed in the reference sequence (K03455.1). Columns C and D give the probe number and the chip that the probe is on (out of the 4 chips designed), column E shows the bases that its present in the reference sequence at the corresponding position. The codes of the Protease and Reverse transcriptase are show in column F with the prefixes of PR and RT respectively. Column G shows the amino acid coded for in cases where this differs from the reference, and column H shows the predicted change in the amino acid at that codon. Column codes for differences between the genotyping and re-sequencing data obtained by ArrayDASH with 0 signalling no difference and 1 signifying there is a difference. Column J shows the results of the genotyping probes and in cases where there was not a clear difference in terms of dissociation temperature between 2 or more probes all are listed such that each base in the codon is listed against its respective base position and the different possibilities are separate by an underscore. Column K defines any differences or errors using the following key: 1 - re-sequencing dropout (i.e. the signal from the probe was so low intensity as to be uninterpretable or all the signal was depleted in the plot by 30℃ making it indistinguishable from the signal from the non-specifically bound products in plots where there was a clear signal above 35℃. Where signals were all depleted between temperatures of 30-35℃, additional requirements were considered to ensure parity with results from probes where signals had not been depleted by 35℃, primary amoungst these was the need for all 4 probes to have given signals, but also the requirement for the single for the probe with the highest dissociation temperature to have a clear peak structure in the resultant first derivative plot. 2- mismatch between the genotyping probe call and a homozygous re-sequencing probe call. 3 - mismatch between the genotyping probe call and a heterozygous re-sequencing probe call. 4 - where their was a signal from the probe in question but it was not able to be translated to an interpretable result. Column L shows the calls made when considering the re-sequence probes data. Column M denotes whether a variant seen was listed in the file of variants (2005110114_nucVariants) derived from the next generation sequencing of the patient HIV sample (2005110114), Column N represents the type of mismatch relative to the reference sequence used by the Spanish team during NGS analysis (shown in column J of the file HIV_SpainReference) using the same key as used for column K. Column O lists any mismatches between the sequences determined from the ArrayDASH re-sequencing of the patient sample and the reference sequence used by the Spanish team (again 1 indicates a mismatch and 0 no mismatch). Column P indicates any mismatches between the Spanish teams reference sequence and the reference sequence used to design the arrays (shown in column E) using the same 1/0 system. Column Q shows the temperature at which peak occurred in the base called. Column R gives the reference sequence used by the Spanish team. Columns S though AA summarise the data and analysis for the clone (control) sample and are as follows. Column S details the call made using the genotyping probes using the same logic and nomenclature as for column J. Column T denotes the agreement or not between the results from the genotpying probes (column S) and those from the re-sequencing probes (column V) using 0/1 to denote no mismatch/mismatch respectively. Column U codes the mismatch and no results using the same 0-4 scale as used in column K (1 - re-sequencing dropout (i.e. the signal from the probe was so low intensity as to be uninterpretable or all the signal was depleted in the plot by 30℃ making it indistinguishable from the signal from the non-specifically bound products in plots where there was a clear signal above 35℃. Where signals were all depleted between temperatures of 30-35℃, additional requirements were considered to ensure parity with results from probes where signals had not been depleted by 35℃, primary amoungst these was the need for all 4 probes to have given signals, but also the requirement for the single for the probe with the highest dissociation temperature to have a clear peak structure in the resultant first derivative plot. 2- mismatch between the genotyping probe call and a homozygous re-sequencing probe call. 3 - mismatch between the genotyping probe call and a heterozygous re-sequencing probe call. 4 - where their was a signal from the probe in question but it was not able to be translated to an interpretable result). Column V shows the dissociation temperature (indicated by the highest temperature peak in the first derivative of fluorescence plot) for the probe corresponding to the called base. Column X codes the disagreements between the results of the re-sequencing probes and the reference sequence used to construct the arrays (column E). Columns AC to AI show the calculations used to determine the CG content of each probe. Column AA show which probes were omitted from the the plot “Probe GC content and Tmax vs probe number” because they overlapped with a probe determined to have a mismatched base or a region where there was poor/dropout signal. These regions were omitted as it was expected that they would deviate from the theoretical disassociation temperatures in unknown and unpredictable ways, making a comparison unreliable. Plots The columns AB - BF show the values and calculations used to created the two plots below. Probe GC content and Tmax vs probe number. The purpose of this plot was to see how closely perfectly matched probes conformed in terms of dissociation temperature (column AJ) to the calculated GC content (column AI) Theoretical probe dissociation temperature based on GC percentage vs actual dissociation temperatures of probes at those GC percentages. This plot compared the actual disassociation temperatures (Column AJ) of probes with varying GC content (Column AI), with those that would be expected using the calculation seen in columns BD-BF rows 1-20. Full patient Data_Analysis10_v24 for figure All Sheet The Subject and Gene columns are self explanatory. The MutDescript-1 Column describes the variation from the reference sequence seen using the shorthand position in gene: reference base / alternative bases seen or in the case of a amino acid variant PI-Reference AminoAcid letter base position alternative amino acid (versions if appropriate). The MutDescript-2 also describes the mutation but using the following nomenclature. For single base variants s(base position, alternative base) and for amino acid changes three single base substation descriptions are combined using the prefixes “s” where the base is a substation from the reference and “m” where it matches. The RefBase column and MutBase columns are self explanatory and are relative to the Spanish Team’s reference sequence. MutBasePos gives the position of the mutatant base in the HBX2R gene. The RefAA gives the single letter code for the reference amino acid (according to the Spanish Team’s reference DNA sequence) and the MutAA column gives the alternative amino acid see due to the alternative allele(s) seen. The MutAAPos gives the position of the alternative amino acid in the HBX2R genes amino acid sequence. As before F count given the count for the given base in there forward reads and Rcount gives the counts for the reverse reads. While the FTotal and RTotal columns give the total number of reads in the Forward and Reverse read data. Count PC gives the total percentage read count for the alternative base when totalling both the forward and reverse read count. NonDups sheet. This sheet summarises all the variant data without duplicated data in it. The columns are the same as those in the All sheet. NonDupsOver1PC. This sheet filters the data from the NonDups sheet to filter out those variants detected at less than 1% in the NGS data, as this is below the lower limit of detection determined for ArrayDASH and so it would not be expected to detected these variants. The columns in this sheet are the same as in the NonDups and All sheets. NonDupsOver1PC_BaseLevel. This sheet filters the data in the NonDupsOver1PC further to extract only those columns of data that are relevant to the further analysis. Comparison. This sheet is where most of the comparison between the data collected via both NGS and ArrayDASH. MUTATION LIST AND FREQUENCY FROM SPAIN, FOR READ COUNTS >1% BOX This box includes the data from the NonDupsOver1PC_BaseLevel sheet and adds the MutnPercentage column which gives the overall percentage of reads showing the alternative base across both the forward and reverse reads plus the MutBasePos which shows the position of the variant base in the HXB2R gene sequence. DASH DATA BOX This box gives a summary of the relevant ArrayDASH data from the patient HIV sample. The MutBasePos again gives the position of the variant in the HXB2R gene. The Leicester key, is a combination of the base position and the chip number that contains the probe that assays that position. LeicesterRef gives the base present in the reference used by the Leicester team to design the HIV microarrays. ObsBase gives the bases seen in the ArrayDASH data. ObsBaseFreq(%) gives the percentages of the non-reference sequence base see on the ArrayDASH data. ALL DASH DATA NOT REFERENCE AND NOT MUTATED IN SPAIN DATA BOX This box lists all those bases that were not the same as the Leicester reference sequence in the ArrayDASH data and that were only split between the reference base and one other base in the Spanish team’s NGS data. The Chip Keys give the position of the base referred to in relation to the microarrays used by Leicester Team. Observed RefOrAlt column lists the call in the ArrayDASH data. Peak Tmax column lists the temperature at which the first derivative peak used to make the base call occurred at. SPAIN BOX This lists the comparable base calls made by the Spanish team using their NGS data. The RefBase, MutBase and MutnPercentage columns list the reference allele, alternative allele and the percentage of the overall reads that contained the alternative base respectively. The ArrayDASH REF Calls lists the base call and percentage of that call in the ArrayDASH data. The ArrayDASH NONREF(InAdn2Ref) list the non reference bases called by ArrayDASH data and the percentage of the non-reference alleles frequency calculated from he ArrayDASH. The final column lists those bases where the profile in the ArrayDASH data was uninterpretable. The next column lists the agreement between the two methods and whether they called a reference or alternative allele using the following key: Y = mutation in DASH and Spain (NGS freq 5-95%). YY = mutation in DASH and Spain (NGS freq 100%). D = mutation in DASH. S = mutation in Spain. X = position not called in DASH. (X) = position not called in DASH, but called as mutation in Spain. (?) = mutation in Spain at 1-5%, so ignored. !! = Spain vs DASH report different mutation. Summary sheet. This sheet summarise all the comparisons made between the NGS and ArrayDASH data. The OriRow value is used to relate the data in a given row to the data used in the Graphs sheet. The DASH=N column denotes all those positions where it was not possible to make a base call based on the ArrayDASH data. The NGS:RefBase, NGS:MutnBase and NGSMutnPercentage list the Reference base in the Spanish reference sequence the Alternative base called by NGS and the percentage of reads assigned to the alternative base call respectively. Similarly the DASH:RefBase, DASHRefBaseFrequency, DASH:MutBase and DASH:MutBaseFreq list the reference allele, the reference allele frequency, the alternative allele and alternative allele frequency seen in the ArrayDASH data respectively. The rest of the summary sheet gives the percentages of the different calls and comparisons as explained in the text taking in to account the information in cell L7 and the data I the table covering N24:P37 Graphs The OriRow value is used to relate the data in a given row to the data used in the Graphs sheet. The BaseNumb column gives the base number in the HBX2R gene seequence being analysed. The DASH:MutnFreq column gives the alternative allele frequency calculated from the ArrayDASH data. The NGSMutFreq gives the alternative allele frequency determined form the NGS data provided by the Spanish team. The call column gives a code for the agreement between the NGS and ArrayDASH results for the percentage of the alternative allele seen using the following code: Y = mutation in DASH and Spain (NGS freq 5-95%). YY = mutation in DASH and Spain (NGS freq 100%). D = mutation in DASH. S = mutation in Spain. X = position not called in DASH. (X) = position not called in DASH, but called as mutation in Spain. (?) = mutation in Spain at 1-5%, so ignored. !! = Spain vs DASH report different mutation. The base column gives the position in the HBX2R gene that is being assayed. The flank19Ref:%C+G column gives the percentage GC in the surrounding 19bp (9bp upstream, 9bp downstream and the assay position) for the RefBase column which gives the reference sequence used by the Spanish team. The flank19Ref:T+A+C+G column gives the total number of bases considered in the flank19Ref:%C+G column. THe Clone call TM gives the dissociation temperature of the probe with the highest disassociation temperature used to make the base call of the allele, for the re-sequencing data obtained from the HIV Clone control sample as this varied very little from the reference sequence and so would give the truest representation of how the GC percentage of the target translated to the disassociation temperature of each of the called probes in the re-sequencing data. The BASES SCORING AS 'N' (all flank 19 bases scored) section gives the data for all the probes that failed to give an interpretable signal (i.e. scored N) and their surrounding 19 base positions (i.e. all those bases positions that overlap with the base position that failed to give an interpretable signals). Again the Base column refers to the position in the HBX2R gene. The Probe%C+G gives the percentage CG contents of the probes taken from the flank19Ref:%C+G column for the appropriate probe. The next 2 columns were not used. The Probe Tm columns takes the Peak TMax reading for the appropriate base position (see chip key column the give chip no: base position) in the comparison sheet. The Ref Base column gives the reference base in the Leicester Teams reference sequence used to design the microarrays. These columns are then repeated in the BASES NOT SCORING AS 'N' (all flank 19 bases scored) box which as the name suggests covers all those bases and the surrounding 19bp that did give an interpretable signal in the ArrayDASH data. The ReferenceNumbers sheet was used as a working scratch pad to align all the relevant base positions, probe numbers and reference bases, as well as to allow the calculation fo the probe GC percentages. Accessing Numbers and Pages files. These acne be accessed by setting up a free iCloud account and then uploading the files to it and the using the appropriate program to open them. References Adachi, A., Gendelman, H., Koenig, S., Folks, T., Willey, R., Rabson, A., Martin, M. (1986). Production of acquired immunodeficiency syndrome-associated retrovirus in human and nonhuman cells transfected with an infectious molecular clone. Journal of virology. 59(2), pp284-91.  Marcy, Y., Cousin, P., Rattier, M., Cerovic, G., Escalier, G., Béna, G., Guéron, M., McDonagh, L., Boulaire, F., Bénisty, H., Weisbuch, C., Avarre, J-C (2008). Innovative integrated system for real-time measurement of hybridization and melting on standard format microarrays Biotechniques 44(7), pp913-920. https://dx.doi.org/10.2144/000112758 Taylor, S., Smith, S., Windle, B. & Guiseppi-Elie, A. Impact of surface chemistry and blocking strategies on DNA microarrays. Nucleic Acids Res 31, (2003).