+44 7868 792050
Protein glycosylation is an important post-translational modification. It enhances the functional diversity of proteins, half-life and influences their biological activity. Defective glycosylation often leads to multisystem disease and adds itself to the expanding group of ‘Congenital disorders or glycosylation’ which are predominantly disorders of N-linked glycosylation. Another rapidly growing group of disorders are defects in O-linked glycosylation, including a subset of dystroglycanopathies. Current diagnostic strategies for glycosylation disorders are compounded by the multivariate clinical phenotype of many of the diseases. Biochemical tests such as the isoelectric focusing of transferrin and apolipoprotein CIII are used to assess a patient’s glycoform profile before in depth enzyme and genetic analysis is initiated. Whilst the glycoform profiling has been instrumental in screening for many glycosylation disorders, there is a need for a more sensitive and informative test. This short review gives an overview of the recent methods used in glycobiology research that could be used to devise such a test, which alongside currently used diagnostic tests should further facilitate the delineation of CDG subtypes. It provides a view to a potential strategy using marker glycopeptides to develop a mass spectrometry based assay that could be implemented into clinical diagnostic laboratories.
Keywords: Congenital disorders of glycosylation; CDG; Glycosylation; Glycopeptide; Dystroglycanopathy
CDG: Congenital Disorders of Glycosylation; PTM: Post-Translational Modification; Apo CIII: Apolipoprotein CIII; MALDI TOF MS: Matrix Assisted Laser Desorption Ionisation Time of Flight Mass Spectrometry; A1AT: α-1-Anti-Trypsin; HILIC: Hydrophobic Interaction Chromatography
The human genome sequencing project demonstrated 30-50,000 genes, but the human ‘proteome’ shows we have > 500,000 proteins . The reason for the discrepancy between genes and proteins encoded for is from a higher order of complexity of protein products due to ‘post translational modifications’ (PTMs). PTMs of proteins occur after they have been synthesised and other molecules such as glycans and phosphate groups are attached to the protein. These are vital for its function or delivery to its site of action. Analysis of the genetic code of a protein cannot predict this; thus, several diseases require further characterisation by other means such as mass spectrometry to help understand the cause of the malfunction of a protein.
The most common and complex form of post translational modification is ’glycosylation’, the enzymatic addition of carbohydrates to proteins (or lipids). It is estimated that 1% of human genes are required for this specific process  and more than one half of all proteins are glycosylated, according to estimates based on the SwissProt database and >90% of plasma proteins . In humans, protein-linked glycans can be divided into 4 main categories: N-linked (linkage to the amide group of Asparagine), O-linked (linkage to the hydroxyl group of Serine or Threonine), the very rare C-linked (linkage to a carboxyl group of Tryptophan) and formation of GPI anchors . Protein glycosylation is an important post-translational modification: it enhances the functional diversity of proteins and influences their biological activity. A wide range of functions for glycans have been described, from structural roles to participation in molecular trafficking, self-recognition and clearance.
Disorders in glycosylation encompass a large spectrum of various inherited diseases which principally compose the group of ‘Congenital Disorders of Glycosylation’ (CDG). The first molecular genetic defect was described in 1980. The typical clinical features of phosphomannomutase (PMM)-CDG, previously CDG-Ia, include dysmorphic features such as inverted nipples, elongated fingers, abnormal fat distribution, structural abnormalities such as cerebellar hypoplasia and predominantly neurological problems such as developmental delay and epilepsy. A broad range of organ systems might also be affected causing gastrointestinal symptoms, hypoglycaemia, hypogonadism and skeletal abnormalities . CDGs are an expanding collection of mostly autosomal recessive inherited multisystemic disorders; a recent review on CDG  reports that there are now more than 45 distinct CDG disorders identified. These are nowadays sub grouped in defects of protein N –glycosylation, defects of protein O-glycosylation, defects of lipid glycosylation and of GDP anchor glycosylation and those of multiple glycosylation pathways and others .
N-glycosylation is known to occur at a particular sequon in the protein amino acid sequence ‘N/*/S/T’ where * can be any amino acid except proline, however this does not necessarily mean it will definitely be glycosylated. Abnormal glycosylation may be due to protein defects located in the early N-glycan pathway (in the cytoplasm or the ER) until the transfer of the glycan to the protein, or are linked to defects localized in the processing of N-glycans on the glycosylated protein, situated mainly in the Golgi compartment.
O-glycosylation biosynthesis is an even more complex process with an enormous number of genes involved. There are seven different types of O-linked glycans known in humans and are classified according to the first sugar attached to a Serine, or Threonine residue of a protein . Although there is a consensus sequon for determining N-glycosylation, there is no sequon for determining O-glycosylation, however there are some bioinformatics websites such as the NetOGlyc 3.1 server (https://www.cbs.dtu.dk/services/NetOGlyc/) that predict sites of potential mucin type O-glycans, based on information of known and documented O-glycosylated proteins . Some O-glycosylation disorders affect only a particular O-glycan type such as the dystroglycanopathies, where primarily the O-mannosylation of alpha-dystroglycan is affected . Certain disorders affect several O-glycan types, and others also affect the biosynthesis of other glycoconjugates. The primary defect of many of these disorders may come from glycan-specific transferases, the biosynthesis of nucleotide sugars and / or their transport to the ER/Golgi. The discovery of defects in the conserved oligomeric Golgi complex (COG), resulting in a disruption of Golgi trafficking, led to a new distinctive group of several combined glycosylation defects, affecting N- and O-glycoslyation .
The clinical variations within a disorder and among the different inborn errors of O-glycan metabolism are enormous and the disorders described so far may only be the beginning. One such class of O-linked disorder currently emerging as a novel form of CDG is a subset of the dystroglycanopathies . Alpha dystroglycan is a large glycoprotein of approximately 156kDa in skeletal muscle and is heavily O- glycosylated. Many of the described mutations for muscular dystrophy affect the O-glycosylation pathways for the biosynthesis of α-dystroglycan and recently a new type of CDG has been described that results in a defect in the O-mannosyl pathway . There is a lack of a good diagnostic test for defective O-mannosyl glycosylation, as to date there are no known easily available proteins with this form of O-glycosylation that can be tested. Clinicians have to rely on muscle biopsy material for diagnosis. However some forms of muscular dystrophy have been reported to have defects in the glycosylation pathway which affect not just O-mannosyl but also GlcNAc O-glycosylation. One such defect is related to LARGE mutations, which have recently been described to affect both of these forms of O-glycosylation . This has been confirmed in our own laboratories, as LARGE patient serum demonstrated a reduced molecular weight form of the heavily N and O-glycosylated protein C1-inhibitor by western blotting . This shows that investigation of N- and O-GlcNAc glycosylation should be performed for unknown causes of muscular dystrophy as this can significantly highlight and narrow down the step in the glycosylation pathway that is affected.
Current diagnostic strategies
The clinical presentation of glycosylation defects is broad and commonly involves the central nervous system and/or presents as a multisystem disorder often with dysmorphic features. The differential diagnosis of a glycosylation defect is mainly discussed for patients with a range of clinical problems, affecting different organ systems, with yet unidentified underlying genetic cause. The suspicion of a distinct glycosylation defect will be raised, should the presenting list of clinical problems fit well to the common description of a certain CDG subtype. For several glycosylation defects, key clinical features have been identified, e.g. inverted nipples, fat pads, and cerebellar hypoplasia for PMM-CDG, protein-losing enteropathy, gastrointestinal bleeding,liver disease with normal or only minor neurological problems for PMI-CDG, or cutis laxa for ATP6VOA2. For some CDGs, distinguished clinical symptoms will pinpoint to a single or small subgroup of CDG eg multiple cartilaginous exostosis for EXT1/EXT2- CDG, hyperphosphatemic familial tumoral calcinosis for GALNT3- CDG, muscle-eye-brain disease for different dystroglycanopathies such as POMT1/POMT2 or POMTGNT1-CDG or FKRP-CDG. However, some of the key features might be typical for several glycosylation disorders, such as ichthyosis, which is commonly seen in MPDU1- , DOLK- or SRD5A3-CDG - all defects of multiple glycosylation. A detailed review of clinical symptoms of CDG and the current accepted route to establish a diagnosis of CDG is given by Lefeber et al. .
The isoelectric focusing pattern of transferrin is typically used to confirm defective N-glycosylation and is usually the first port of call in the process towards the diagnosis of CDG. Transferrin is a 79kDa serum glycoprotein with two N-linked glycans and is highly abundant in plasma. Figure 1 shows the typical patterns seen in CDG type I (assembly defects) and II (defects of the glycan processing). False positive results, such as transferrin natural polymorphic variants and secondary glycosylation disorders (mainly galactosemia and hereditary fructose intolerance) must be excluded. This can be done by further IEF testing with pre-incubation with neuraminidase, which will remove the glycans so that the change can be attributed to a polymorphism and not glycosylation. A positive transferrin test leads onto further enzyme and genetic testing to identify the type of CDG that the patient may have.
Figure 1: An IEF pattern of serum glycoprotein transferring. Lanes 1 and 6 show normal serum transferrin, composed mostly of tetrasialotransferrin with small portions of mono, di, tri and pentasialotransferrin. Lanes 2 and 3 show IEF pattern from two CDG Ia patients. A reduction in tetrasialotransferrin is seen with a greater proportion of asialo and disialotransferrin, causing a cathodal shift in the IEF pattern. Lanes 4 and 5 show CDG IIx defects; the IEF pattern shows an increase in mono and trisialotransferrin fractions. Lane 7 shows a polymorphic variant eliminated by neuraminidase treatment.
If the disease is suspected to be an O-linked disorder, then the isoelectric profile of another serum glycoprotein, apolipoprotein CIII is investigated. Apolipoprotein CIII (apo CIII) is a small glycoprotein of mass 8.7 kDa that has one mucin type O-linked glycosylation site and has been shown to be useful in detecting defective core 1 mucin type O-glycosylation .
An ideal test
Whilst transferrin has proven to be a useful marker of defective N-glycosylation, it can show a normal IEF profile in some cases of CDG . Also IEF only shows changes in charge and lacks information on changes in molecular weight, thus making it difficult at times to distinguish between Type I and II defects. It is becoming apparent that additional and improved tests are needed for detecting and characterising various glycosylation disorders, streamlining the most appropriate subsequent tests at an early stage of diagnosis.
A prerequisite for a test for a clinical laboratory to use will be robust, simple, requiring minimum sample preparation and of course minimal cost, being highly specific and sensitive. As discussed in more detail further on, the majority of current strategies for in depth investigation of glycans and glycopeptides require specialised state of the art expensive equipment, which is not practical or economic for clinical diagnostic laboratories. An additional challenge for diagnostic strategies is to find a single method that can be used to detect aberrant glycosylation in both N- and O- linked disorders.
An ideal screening tool would be able to identify site occupancy (i.e present or not present in CDG I), changes in glycan structure which would give information on CDG II conditions and O-linked disorders. Glycosylation analysis can take three different routes i) characterisation of glycans on intact proteins ii) structural analysis of chemically or enzymatically released glycans and iii) characterisation of glycopeptides.
The first route of ‘glycoform profiling’ is currently the method used for transferrin and apo CIII analysis. This approach could be improved by the use of a better protein or biomarker that would have a larger number of glycans to understand the macroheterogeneity, and preferably be detectable in an easily procurable sample such as urine or plasma. Transferrin has only two N-linked glycans and apo CIII only one O-linked glycan; there is scope for the discovery and characterisation of a heavily N- and O- glycosylated protein that could be used to check if the defective glycosylation is N-linked, O-linked or both. Whilst the interpretation of a more heavily glycosylated protein maybe more cumbersome for IEF analysis only, a simple small 2D system that also shows changes in molecular weight may give additional information that can be used to analyse and interpret. Small IEF strips have recently been made available by various biotech companies and can be applied to a small mini-gel system, thereby creating a mini-2D gel system that can easily be subjected to western blotting. Currently in our laboratories we are testing one such potential maker using this simple system: the plasma C1-protease inhibitor has seven N and seven O-linked glycans that in effect double the molecular weight of the native protein .
The second route involving direct analysis of glycans does not necessarily require a specific protein. Typically whole plasma glycan analysis has been performed using Matrix Assisted Laser Desorption Ionisation Time of Flight Mass Spectrometry (MALDI TOF MS). Whilst this approach can give information on overall glycan composition it does not give information on site occupancy or microheterogeneity. The analysis of O-glycans in this way is more complex compared to N-linked glycan analyses.
The third route is the characterisation of glycopeptides. This type of analysis, on an ideal glycoprotein marker performed by mass spectrometry, could simultaneously provide information on glycan structure and glycan microheterogeneity. However, this latter method is notoriously difficult to perform due to the high molecular weight of glycopeptides.
Applying transferrin isoelectric focusing as a first line screening procedure for CDG might lead to inconclusive results: or, the findings might be difficult to interpret or are normal, although the patient’s presentation might be clinically highly suspicious for CDG. Those results can be taken further, by testing of another N-glycoprotein such as α1-anti-trypsin (A1AT). A1AT is another abundant serum protein with three known N-linked glycans . Another approach is to perform a serum ‘proteome analysis’ using 2D gel based technology. Glycoforms of various serum proteins have been extensively studied in the past using 2D PAGE [18-22]. This approach is extremely useful in taking a detailed look at a patient’s serum glycoprotein IEF and molecular weight profiles (which cannot be observed in current transferrin Phast System IEF analysis).The additional ability to observe molecular weight changes means that CDG I patients can easily be distinguished from CDG II by the presence of lower molecular weight glycoforms. Previous 2D PAGE based studies however were hampered by the high abundance of albumin and availability of pH ranges that can be investigated. Advances in 2D PAGE technology include development of IEF strips with high resolution over narrow pI ranges and also the 2D Difference Gel Electrophoresis technique (DIGE), which we have optimised for the investigation of CDG patient serum in our laboratories. 2D DiGE is a widely used technique in proteomics to look for differential protein expression. It involves the labelling of up to three different samples with fluorescent Cy dyes, which can then be run on one single gel (Figure 2). This approach is very good for looking at glycoproteins as patient samples can be directly ‘overlaid’ with a normal profile to look for small subtle changes in charge and mass that would not be detected by comparing separate gels. Further optimisation can be achieved by depleting serum for albumin and IgG using easily available immunoaffinity columns. This increases the detection of lower abundant proteins and also reveals proteins of similar mass and PI to albumin and IgG that would otherwise be obscured. Through our own investigations we have found that this technique can be optimised to detect glycoproteins by using narrow pH range IEF strips. As heavily glycosylated proteins carry a more negative charge due to the terminal sialic acid residues on the glycans making the protein more acidic, we have found a pH range of 3-5.6 can pick up most of the important glycoproteins such as A1AT and caeruloplasmin. The narrowed range also amplifies the subtle shift changes that would be harder to see using broad pH ranges such as 3-10. This technique also can detect changes in some O-linked disorders as Apo CIII can be studied as well as other O- and N-linked proteins such as α-2-HS-glycoprotein and C1 plasma inhibitor. The drawbacks of this technique are that its labour intensive and expensive and therefore although superior can only act as a second tier test of an inconclusive transferrin result.
Figure 2: Panel A shows the methodology for the 2D DiGE technique. Individual samples are labelled with fluorescent Cy dyes that can be combined and resolved on a single 2D gel. The gel is then scanned for the three dyes creating overlaid images which give us a look at the serum proteome highlighting changes in subtle charge and mass in glycoproteins from patients with CDG that have an inconclusive transferrin results. Panel B shows overlaid 2D DiGE images of glycoforms of α-1-antitrypsin and caeruloplasmin showing typical charge and mass change profiles seen in CDG-I and CDG-II. In CDG-I, the abnormally glycosylated proteins have unusual mass and charge and, compared to the normal proteins, are “shifted” to the right and down; in CDG-II, the abnormally glycosylated proteins are “shifted” to the right.
Glycan analysis is informative for elucidating the partial glycan structures that occur in CDG II  however it is limited for CDG I as glycans are absent or present (in whole). In depth analysis of the glycans from CDG patients has conventionally been performed using Matrix Assisted Laser Desorption Ionisation Time of Flight Mass Spectrometry (MALDI ToF MS). This established method involves the removal of the N-linked glycans from glycoproteins either chemically or enzymatically. Glycans are then purified and desalted prior to analysis by MALDI ToF. Analysis can be on whole tissue, plasma or serum or it can be performed on purified protein such as transferrin or A1AT. Unlike protein and peptide analyses, glycan masses alone are often sufficient to allow identification of the type of glycan attached to the asparagine (i.e. complex, high-mannose or hybrid). In addition, the masses of the peptides covering the sequon are +1Da due to the conversion of an asparagine to an aspartic acid by PNGase F and allowing the identification of which glycosylation sequons are occupied . Recent further technical advances in mass spectrometry such as MALDI ion trap profiling have allowed sequencing of glycans that reveal detailed information on glycan structures thereby distinguishing between primary genetic defects in the N-glycosylation process, Golgitrafficking disorders, and secondary causes of underglycosylation .
Analysis of O-glycans by mass spectrometry is more difficult, as O-glycans are more complex and heterogeneous and hence, so far the majority of O-linked disorders are found by targeted genetic approaches. O-glycans cannot be removed from their sites enzymatically but have to be removed through alkaline b-elimination or hydrazinolysis, which in-turn denatures the peptide or proteins and thus does not allow any further site occupancy analyses to be undertaken. Technological advances in mass spectrometry are promising for improved N- and O-glycan analysis . The ability of Quadrupole Time of Flight (QToF) instrumentation to facilitate MS experiments, especially on glycans which have been derivatised by permethylation, is allowing clear structural assignment of isomeric glycans . Detection of glycans can be improved by using labelling techniques such as the tagging the glycans with fluorophores which increase spectral absorption of glycans, thus improving their detection by high performance anion exchange chromatography –HPLC and ESI-MS methods .
Another strategy of analysing glycans is to analyse glycopeptides by mass spectrometry. It does not require a step to remove the glycans as they are in effect analysed whilst on the peptide. However they do require an enrichment step as the presence of non-glycosylated peptides reduces sensitivity of subsequent MS analysis. Advances in mass spectrometry are improving glycopeptidomic analysis as better fragmentation technologies have been applied such as electron capture dissociation (ECD) and electron transfer dissociation (ETD), allowing the direct mapping of any sites of N- and O-glycosylation .
Enrichment strategies include the use of lectins which are a diverse group of carbohydrate-binding proteins. Each lectin has its own specificity profile and many have been used extensively in biochemical fields including proteomics, due to their usefulness as detection and enrichment tools for specific glycans. Lectins can be applied to enrich either whole glycoproteins or glycopeptides. Many techniques have been devised using lectins to investigate altered patterns of glycosylation in disease [30,31]. However, lectins have not been used extensively in CDG research; one article describes the reduced binding of CDG-I patient transferrin to ricin . Lectins have proven a useful tool in studying intact glycans, their usefulness for the investigation of partial glycans observed in disorders of glycosylation is limited. A better approach to glycopeptide or protein enrichment may be the application of hydrophobic interaction chromatography (HILIC). HILIC has proven to be a convenient method for analysing highly polar molecules such as metabolites and it has been adapted to be efficient in the extraction of glycopeptides . Many glycobiology studies are using HILIC and one recent study has applied the HILIC method to the glyco-profiling of a therapeutic monoclonal antibody and proteins with several N-linked and O-linked glycosylation sites. Using a dataindependent MS acquisition (MSE) function that can quantitate individual ions, unsuspected glycopeptides and site-specific glycan microheterogeneity can be detected .
Further improvement of glycopeptide analysis could be achieved by controlling the size of the glycopeptides. Most MS instruments have an upper detection limit of 2000-2500 Da. Many glycopeptides are above this range therefore it may be better to analyse smaller glycopeptides by optimising digestion methods with multiple proteases. However the challenge to this approach is that glycosylation sites are not always located close to the cleavage sites of the standard proteolytic enzymes, potentially resulting in glycopeptides that are still too large for effective tandem MS. One study has addressed this issue by the use of non-specific proteases such as Pronase . These proteases digest the amino acid backbone of a glycoprotein to small (<4 amino acid) peptides and amino acids, except in regions where glycosylation is present to inhibit digestion. Using this approach they were able to identify multiple glycan compositions at each individual glycosylation site and thereby improve the determination of glycan microheterogeneity by mass spectrometry.
The compendium of glycosylation disorders is ever increasing and with the development of better biochemical and genetic investigative techniques, such as next-generation sequencing, the potential for many new disorders of glycosylation to be discovered, is very real. Although emerging rapidly, these methods still take considerable time and are expensive, therefore there is a need for more improved biochemical testing that can give comprehensive information on the effect on several glycoprotein or glycolipids, channelling the subsequent research into the definition of the exact step of the biosynthetic pathway of glycosylation being affected. To date, IEF of transferrin and apo CIII are the only broadly available clinical laboratory tests available in clinical practise, used as a first step in diagnosing a disorder of glycosylation.
Biomarker discovery research is a growing field particularly in glycobiology, mostly due to the discovery that many cancers have altered glycosylation profiles [36,37]. However there is a lack of translation of markers into clinical laboratory tests. This is clearly apparent and of great importance in the field of glycosylation disorders. The study of protein glycosylation is accepted as problematic and complex and there is still not an established accepted method for the definitive analysis of glycans. However, with recent technological advances in mass spectrometry, many groups are publishing various promising methods in N- and O-linked glycan and glycopeptide analysis . These methods can be used to determine defects in glycosylation in a patient but they are too complex and expensive for translation to a clinical laboratory test (Figure 3). Glycopeptide analysis seems the most promising for a potential translational test. Peptides are emerging as a method of accurate protein quantitation using simple tandem LCMS/ MS and have an economic advantage over antibody based testing [39,40].
Marker glycopeptides that have been detected and characterised to specific glycosylation defects using more complex technology could subsequently be adapted to a simple tandem LC-MS/MS test. The adaptation of this methodology leading to a high throughput specific test to investigate glycosylation disorders is an exciting possibility.