publications
2024
- Sci. DataA dataset of alternately located segments in protein crystal structuresAviv A Rosenberg, Ailie Marx, and Alexander M BronsteinScientific Data, Jul 2024
Protein Data Bank (PDB) files list the relative spatial location of atoms in a protein structure as the final output of the process of fitting and refining to experimentally determined electron density measurements. Where experimental evidence exists for multiple conformations, atoms are modelled in alternate locations. Programs reading PDB files commonly ignore these alternate conformations by default leaving users oblivious to the presence of alternate conformations in the structures they analyze. This has led to underappreciation of their prevalence, under characterisation of their features and limited the accessibility to this high-resolution data representing structural ensembles. We have trawled PDB files to extract structural features of residues with alternately located atoms. The output includes the distance between alternate conformations and identifies the location of these segments within the protein chain and in proximity of all other atoms within a defined radius. This dataset should be of use in efforts to predict multiple structures from a single sequence and support studies investigating protein flexibility and the association with protein function.
- AISTATSVector Quantile Regression on ManifoldsMarco Pegoraro, Sanketh Vedula, Aviv A Rosenberg, Irene Tallini, Emanuele Rodola, and 1 more authorIn Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, Jul 2024
Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate and geological phenomena), and tori (dihedral angles in proteins). By leveraging optimal transport theory and c-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets and likelihoods. We demonstrate the approach’s efficacy and provide insights regarding the meaning of non-Euclidean quantiles through synthetic and real data experiments.
2023
- PNASAn amino domino model described by a cross peptide bond Ramachandran plot defines amino acid pairs as local structural unitsAviv A. Rosenberg, Nitsan YehiShalom, Ailie Marx, and Alex M BronsteinProceedings of the National Academy of Sciences, Jul 2023
Protein structure, both at the global and local level, dictates function. Proteins fold from chains of amino acids, forming secondary structures, α-helices and β-strands, that, at least for globular proteins, subsequently fold into a three-dimensional structure. Here, we show that a Ramachandran-type plot focusing on the two dihedral angles separated by the peptide bond, and entirely contained within an amino acid pair, defines a local structural unit. We further demonstrate the usefulness of this cross-peptide-bond Ramachandran plot by showing that it captures β-turn conformations in coil regions, that traditional Ramachandran plot outliers fall into occupied regions of our plot, and that thermophilic proteins prefer specific amino acid pair conformations. Further, we demonstrate experimentally that the effect of a point mutation on backbone conformation and protein stability depends on the amino acid pair context, i.e., the identity of the adjacent amino acid, in a manner predictable by our method.
- ICML WorkshopContinuous Vector Quantile RegressionSanketh Vedula, Irene Tallini, Aviv A. Rosenberg, Marco Pegoraro, Emanuele Rodolà, and 2 more authorsIn ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, Jul 2023
Vector quantile regression (VQR) estimates the conditional vector quantile function (CVQF), a fundamental quantity which fully represents the conditional distribution of \rvecY|\rvecX. VQR is formulated as an optimal transport (OT) problem between a uniform \rvecU∼μand the target (\rvecX,\rvecY)∼ν, the solution of which is a unique transport map, co-monotonic with \rvecU. Recently non linear VQR (NL-VQR) has been proposed to estimate support non-linear CVQFs, together with fast solvers which enabled the use of this tool in practical applications. Despite its utility, the scalability and estimation quality of NL-VQR is limited due to a discretization of the OT problem onto a grid of quantile levels. We propose a novel \emphcontinuous formulation and parametrization of VQR using partial input-convex neural networks (PICNNs). Our approach allows for accurate, scalable, differentiable and invertible estimation of non-linear CVQFs. We further demonstrate, theoretically and experimentally, how continuous CVQFs can be used for general statistical inference tasks: estimation of likelihoods, CDFs, confidence sets, coverage, sampling, and more. This work is an important step towards unlocking the full potential of VQR.
- ICLRFast Nonlinear Vector Quantile RegressionAviv A. Rosenberg, Sanketh Vedula, Yaniv Romano, and Alex M. BronsteinIn The Eleventh International Conference on Learning Representations , Jul 2023
Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable Y given explanatory features X. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target Y given the features X; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose vector monotone rearrangement, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.
2022
- Scientific ReportsMachine learning approaches demonstrate that protein structures carry information about their genetic codingLinor Ackerman-Schraier, Aviv A. Rosenberg, Ailie Marx, and Alex M. BronsteinScientific Reports, Jul 2022
- NatCommCodon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codonAviv A. Rosenberg, Ailie Marx, and Alex M. BronsteinNature Communications, May 2022
Synonymous codons translate into chemically identical amino acids. Once considered inconsequential to the formation of the protein product, there is evidence to suggest that codon usage affects co-translational protein folding and the final structure of the expressed protein. Here we develop a method for computing and comparing codon-specific Ramachandran plots and demonstrate that the backbone dihedral angle distributions of some synonymous codons are distinguishable with statistical significance for some secondary structures. This shows that there exists a dependence between codon identity and backbone torsion of the translated amino acid. Although these findings cannot pinpoint the causal direction of this dependence, we discuss the vast biological implications should coding be shown to directly shape protein conformation and demonstrate the usefulness of this method as a tool for probing associations between codon usage and protein structure. Finally, we urge for the inclusion of exact genetic information into structural databases.
2021
- PNASMeeting the unmet needs of clinicians from AI systems showcased for cardiology with deep-learning–based ECG analysisYonatan Elul, Aviv A. Rosenberg, Assaf Schuster, Alex M. Bronstein, and Yael YanivProceedings of the National Academy of Sciences, May 2021
The use of artificial intelligence (AI) in medicine, particularly deep learning, has gained considerable attention recently. Although some works boast superior capabilities compared to clinicians, actual deployments of AI systems in the clinic are scarce. We describe four important gaps on the machine-learning side responsible for this discrepancy by first formulating them in a way that is actionable by AI researchers and then systematically addressing these needs. Aiming beyond the search for better model architectures or improved accuracy, we focus directly on the challenges of clinical usefulness as stated by medical professionals in the literature. Our results show that deep-learning systems can be robust, trustworthy, explainable, and transparent while retaining the superior level of performance these algorithms are known for.Despite their great promise, artificial intelligence (AI) systems have yet to become ubiquitous in the daily practice of medicine largely due to several crucial unmet needs of healthcare practitioners. These include lack of explanations in clinically meaningful terms, handling the presence of unknown medical conditions, and transparency regarding the system’s limitations, both in terms of statistical performance as well as recognizing situations for which the system’s predictions are irrelevant. We articulate these unmet clinical needs as machine-learning (ML) problems and systematically address them with cutting-edge ML techniques. We focus on electrocardiogram (ECG) analysis as an example domain in which AI has great potential and tackle two challenging tasks: the detection of a heterogeneous mix of known and unknown arrhythmias from ECG and the identification of underlying cardio-pathology from segments annotated as normal sinus rhythm recorded in patients with an intermittent arrhythmia. We validate our methods by simulating a screening for arrhythmias in a large-scale population while adhering to statistical significance requirements. Specifically, our system 1) visualizes the relative importance of each part of an ECG segment for the final model decision; 2) upholds specified statistical constraints on its out-of-sample performance and provides uncertainty estimation for its predictions; 3) handles inputs containing unknown rhythm types; and 4) handles data from unseen patients while also flagging cases in which the model’s outputs are not usable for a specific patient. This work represents a significant step toward overcoming the limitations currently impeding the integration of AI into clinical practice in cardiology and medicine in general.All study data are included in the article and/or SI Appendix.
- Nat. Dig. Med.Digital oximetry biomarkers for assessing respiratory function: standards of measurement, physiological interpretation, and clinical useJeremy Levy, Daniel Álvarez, Aviv A Rosenberg, Alexandra Alexandrovich, Félix Del Campo, and 1 more authorNPJ Digital Medicine, May 2021
Pulse oximetry is routinely used to non-invasively monitor oxygen saturation levels. A low oxygen level in the blood means low oxygen in the tissues, which can ultimately lead to organ failure. Yet, contrary to heart rate variability measures, a field which has seen the development of stable standards and advanced toolboxes and software, no such standards and open tools exist for continuous oxygen saturation time series variability analysis. The primary objective of this research was to identify, implement and validate key digital oximetry biomarkers (OBMs) for the purpose of creating a standard and associated reference toolbox for continuous oximetry time series analysis. We review the sleep medicine literature to identify clinically relevant OBMs. We implement these biomarkers and demonstrate their clinical value within the context of obstructive sleep apnea (OSA) diagnosis on a total of n = 3806 individual polysomnography recordings totaling 26,686 h of continuous data. A total of 44 digital oximetry biomarkers were implemented. Reference ranges for each biomarker are provided for individuals with mild, moderate, and severe OSA and for non-OSA recordings. Linear regression analysis between biomarkers and the apnea hypopnea index (AHI) showed a high correlation, which reached 𝑅⎯⎯⎯⎯2=0.82. The resulting python OBM toolbox, denoted “pobm”, was contributed to the open software PhysioZoo (physiozoo.org). Studying the variability of the continuous oxygen saturation time series using pbom may provide information on the underlying physiological control systems and enhance our understanding of the manifestations and etiology of diseases, with emphasis on respiratory diseases.
- FrontiersOpening the Schrödinger Box: Short-and Long-Range Mammalian Heart Rate VariabilityIdo Weiser-Bitoun, Moran Davoodi, Aviv A Rosenberg, Alexandra Alexandrovich, and Yael YanivFrontiers in physiology, May 2021
2020
- Scientific ReportsSignatures of the autonomic nervous system and the heart’s pacemaker cells in canine electrocardiograms and their applications to humansAviv A Rosenberg, Ido Weiser-Bitoun, George E Billman, and Yael YanivNature Scientific Reports, May 2020
Heart rate and heart rate variability (HRV) are mainly determined by the autonomic nervous system (ANS), which interacts with receptors on the sinoatrial node (SAN; the heart’s primary pacemaker), and by the “coupled-clock” system within the SAN cells. HRV changes are associated with cardiac diseases. However, the relative contributions of the ANS and SAN to HRV are not clear, impeding effective treatment. To discern the SAN’s contribution, we performed HRV analysis on canine electrocardiograms containing basal and ANS-blockade segments. We also analyzed human electrocardiograms of atrial fibrillation and heart failure patients, as well as healthy aged subjects. Finally, we used a mathematical model to simulate HRV under decreased “coupled-clock” regulation. We found that (a) in canines, the SAN and ANS contribute mainly to long- and short-term HRV, respectively; (b) there is evidence suggesting a similar relative SAN contribution in humans; (c) SAN features can be calculated from beat-intervals obtained in-vivo, without intervention; (d) ANS contribution can be modeled by sines embedded in white noise; (e) HRV changes associated with cardiac diseases and aging can be interpreted as deterioration of both SAN and ANS; and (f) SAN clock-coupling can be estimated from changes in HRV. This may enable future non-invasive diagnostic applications.
2018
- CinCAdding Two Dimensions to Heart Rate Variability ResearchJoachim A Behar, Ori Shemla, Ido Weiser-Bitoun, Aviv A Rosenberg, and Yael YanivIn 2018 Computing in Cardiology Conference (CinC), May 2018
Introduction: Heart rate variability (HRV) analysis tools have been mainly available for analysis of human electrocardiographic derived heart rate. We explore extending HRV analysis to two additional dimensions: (1) analysis across multiple mammalian species and (2) analysis across different levels of integration for example sinoatrial tissue. Methods: We analyzed the beating rate variability (BRV) across the two additional dimensions using the PhysioZoo computer program that we recently introduced. We used published databases of electrocardiograms from four mammal types: human (n=18), dog (n=17), rabbit (n=4) and mouse (n=8). We computed the BRV measures for each. We also show how the PhysioZoo program can be used for the analysis of sinoatrial node tissue BRV. Results: The study of typical mammalian heart and respiration rates (obtained from the dominant high frequency peak) revealed a linear relationship between these two quantities. Analysis of the rabbit sinoatrial node tissue BRV showed that it had reduced overall variability when compared to in vivo heart BRV.
- MSc. ThesisNon-Invasive In-Vivo Analysis of Intrinsic Clock-Like Pacemaker Mechanisms: Decoupling Neural Input Using Heart Rate Variability MeasurementsTechnion, Israel Institute of Technology, May 2018
Heart diseases account for a quarter of all deaths each year in the US and are also an economic burden with an estimated expenditure for treatment of almost $100B annually in the US alone. Cardiovascular disease mortality rate is correlated with an increase in heart rate which is regulated by both the autonomic nervous system (ANS) and the sinoatrial node (SAN) cells in heart. The heart rate is highly variable and never reaches a steady state even at rest—a phenomenon known as heart rate variability (HRV). Many studies have shown that loss of this variability is strongly associated with morbidity and mortality. By using pharmacological denervation, a method of temporarily blocking the ANS and applying HRV analysis we aim to study the contribution of the SAN to the HRV. We acquired canine ECG data containing both basal (n=27) and denervated segments (n=20). We applied an automated ECG segmentation algorithm to extract the segments from each record. We used a custom R-peak detector, rqrs, based on the PhysioNet’s gqrs, to detect R-peaks in the data and produce an RR-interval time series. We excluded ectopic beats using an automated algorithm and proceeded to apply HRV analysis to the resulting intervals. We implemented all major HRV techniques which can be categorized into time domain, frequency domain (spectral) and nonlinear methods (which quantify physiological complexity). We used these methods to extract HRV features from both the basal and denervated data sets. We implemented all signal processing and HRV analysis methods as an open source MATLAB toolbox, rhrv, and additionally provided a GUI, PhysioZoo, which enables HRV analysis in animal data and comes with an annotated animal database. We have shown that the rqrs peak detector provides accurate detections for annotated human ECG data (F1=93.4) and annotated ECG records from our canine dataset (F1=98.7). Moreover, we adapted HRV analysis techniques to the canine data where necessary and e.g. provide an automatic method of adapting the frequency bands for spectral HRV analysis. HRV analysis of basal vs. denervated data shows that (1) Time domain HRV is significantly reduced after denervation; (2) SAN contributes spectral power mainly in the very-low frequency band; (3) The SA Node contributes most of the physiological complexity of the heart rate, specifically the long-term changes occurring over many beats; (4) The ANS influences mainly the short term, beat-to-beat variability of the heart rate; (5) The contribution of the ANS to the heart rate signal can be modeled as two sine waves at specific frequencies corresponding to periodic autonomic regulation embedded in white noise. Moreover, we suggest clinical indices for the state and function of the SAN directly from basal ECG data by measuring spectral power in the VLF band and multiscale entropy (MSE) values in the high scales. We conclude that by applying HRV analysis to regular ECG data, SAN function can be observed even without pharmacological denervation. This has the potential to allow future non-invasive heart monitoring solutions that can be used e.g. for early detection of SA node dysfunction.
- FrontiersPhysioZoo: a novel open access platform for heart rate variability analysis of mammalian electrocardiographic dataJoachim A Behar, Aviv A Rosenberg, Ido Weiser-Bitoun, Ori Shemla, Alexandra Alexandrovich, and 2 more authorsFrontiers in Physiology, May 2018
- FrontiersA universal scaling relation for defining power spectral bands in mammalian heart rate variability analysisJoachim A Behar, Aviv A Rosenberg, Ori Shemla, Kevin R Murphy, Gideon Koren, and 2 more authorsFrontiers in Physiology, May 2018
2017
- CinCRhythm and quality classification from short ECGs recorded using a mobile deviceJoachim A Behar, Aviv A Rosenberg, Yael Yaniv, and Julien OsterIn 2017 Computing in Cardiology (CinC), May 2017
Introduction: Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia. Its prevalence is 12% of the general population and it is associated with increased risk of mortality and morbidity. Methods: The AliveCor mobile electrocardiogram (ECG) device was used to collect data. The Physionet Challenge aimed to create an intelligent algorithm for automated rhythm and quality classification. A database of 8528 single lead ECG was used for training and a closed database of 3658 ECG recordings was used for testing the participants algorithms on the Challenge server. The RR interval time-series was first estimated using a R-peak detector. Signal quality was estimated on a second-by-second basis and the continuous sub-segment with the highest quality was selected for further analysis. A number of features were estimated: heart rate variability (time domain based, fragmentation, coefficient of sample entropy etc.), ECG morphology (QRS length, QT interval etc.) and the presence of ectopic beats. The features were used to train support vector machine classifiers in a one-vs.-rest approach. Results: For the final score of the challenge we obtained an overall F ι measure on the test set of 0.80. Conclusion: The feature based machine learning approach showed high performance in distinguishing between the different rhythms represented in the Challenge. This opens the horizon for computer automated interpretation of single lead mobile ECG.