Shon Kurian George

Spatial Transcriptomics Pipeline Development
Image-based cell profiling (cypro : R package)

Edinburgh, United Kingdom & Kerala, India · Reachable on Email/LinkedIn · Open to Collaboration

About

Bioinformatician with an MSc in Bioinformatics from the University of Edinburgh shipping reproducible analyses and tools with an interest in providing interactive, easy-to-use applications. Skilled in R, Python, Bash Scripting and building bioinformatic pipelines.

Skills

R / Shiny, Python, bash scripting
Spatial transcriptomics (SPATA2, Seurat)
Git/GitHub, Docker, AWS EC2, Nextflow, MySQL
Excel, Vim, Ollama, Open WebUI

Work Experience

cypro R Package

R Package Developer · Nov 2024 – Present

Developing an R package to streamline image-based cell profiling workflows by integrating data from diverse platforms including CellProfiler and CellTracker. Enhanced package usability by implementing intuitive Shiny-based interfaces that allow detailed specification down to individual well-level configurations, significantly simplifying user interaction.

Refined S4 class objects within cypro to ensure accurate data integration from imaging platforms, enabling comprehensive cell movement analyses. Incorporated proactive user warnings and notifications within the Shiny application, minimising user errors, safeguarding data integrity, and enhancing the overall user experience.

View GitHub repo

10x Visium Spatial Transcriptomics Pipeline to Analyse miRNA Expression Patterns

MSc Bioinformatics Dissertation · Mar 2024 - Sep 2024

Developed an end-to-end pipeline to investigate whether miRNAs act as guardians of gene expression at cell-type boundaries. Utilised SPATA2 and Seurat frameworks to process datasets into analysis-ready form, then performed BayesSpace and Hartigan-Wong K-means clustering to segregate cell types.

Conducted TargetScan miRNA target analysis across 8 datasets (brain, heart, liver), detecting tissue-specific miR-124 (brain) and miR-1 (heart) with high confidence. Identified neighbouring clusters showing high and low target expression for both miRNAs. Built an accompanying Shiny app to interactively view the analysis results (SPATA2 Shiny App).

View GitHub repo

Interactive Applications (Hosted on AWS EC2)

SPATA2 Dataset Specific Analysis Heatmap Generator (R Shiny)

R Shiny app that was developed to enable seamless & automated heatmap generation and dataset comparisons by reducing the manual processing time between datasets and making the comparison of the parameters of a specific dataset instantaneous.

Launch app

Protein Conservation & Motif Suite (Python)

Developed a Python script with a terminal interface to query a taxonomic group and retrieve protein sequences via EDirect. Built a workflow integrating NCBI and EMBOSS tools to assemble datasets (up to 1,000 sequences), automate alignments, generate customisable conservation plots, and perform motif discovery.

Launch app

Small Molecule Database and Visualisation Application (PHP & MySQL)

Full-stack application built with PHP and MySQL, utilising the compounds sourced from the EDULISS (EDinburgh University Ligand Selection System) database to query, filter and visualise 6000+ small molecular compounds.

Key features include:

Complete database infrastructure built from scratch
Filter by manufacturer and molecular properties
Statistical computation of filtered compounds
SMILES (Simplified Molecular Input Line Entry System) visualisation
Complexity indicators for individual compounds

Launch app

Malayil Family Tree Interactive Visualisation (Python & D3.js)

A personal project digitising my own family's heritage. Source data was publicly available via malayilkudumbam.com. Several local models were trialled first (DeepSeek-OCR, NuMarkdown) but Google Cloud Document AI's accuracy on the complex two-column scanned layouts made it the clear choice, despite a small usage fee. The Python pipeline processes 14 scanned historical PDFs across 7 genealogy and 7 history documents, producing a unified MASTER JSON of 1,953 persons spanning 6 family branches and 8 generations, served through an interactive D3.js visualisation with zoom, pan, name search, and common-ancestor path tracing.

Note: this is a personal project and not an official release by the Malayil family organisation.

Explore the tree

Badminton Shuttle Detection Demo (RF-DETR & Flask)

Custom RF-DETR (Roboflow-Detection Transformer) object detection model trained for 200 epochs on 195 manually labelled images, achieving 85.7% precision, 75.0% recall, and an 80.0% F1-Score. A Flask web app presents 6 curated rally clips with a side-by-side toggle between raw footage and annotated detection footage, accompanied by per-rally filtering and detection statistics, deployed on AWS EC2 via Nginx.

View the demo

👀 New Interactive Application 🍳

🚧 In Development

Stay tuned for updates!

Projects

Detailed Collaborative Machine Learning Project on Modifiable Risk Factors Linked to Dementia

Built a polynomial regression model using the SHARE (Survey of Health, Ageing and Retirement in Europe) dataset to predict cognitive scores and identify modifiable risk factors accounting for 40% of dementia cases worldwide. Cross-referenced the model findings with the Lancet Commission Population Attributable Factor (PAF) framework to analyse 7 key interventions, revealing mental health, physical activity, social engagement, and education as critical lifestyle interventions for reducing dementia risk.

Read the report

Analysis and Critique of Automeris io moth de novo Genome Assembly and Annotation by Skojec et al. and assembly using wtdbg2.

Executed a critical evaluation of the genome assembly workflow performed by Skojec et al. and compared assembly quality between Hifiasm (N50: 15.78 Mb) and wtdbg2 (N50: 1.1 Mb). Demonstrated the superior performance of Hifiasm with 98.4% completeness and a 490 Mb assembly across only 600 contigs (vs 3,362).

Read the analysis