Leopoldo Pla Sempere Lecturer & dev @ UAlicante. Sometimes musician.

My Expertise

Hello! My name is Leo, a passionate software developer from Spain. By night musician and hardware researcher, by day senior software dev and lecturer. Game Boy enthusiast anyway.

I work on Natural Language Processing, Machine Learning and Music Information Retrieval state-of-the-art techniques to create useful new tools and technologies in agile environments since 2015.

Developing on NLP from simple scrappers and HTML text extractors to preprocessing and parallel corpora cleaning pipelines, all for a successful set of translation platforms.

Lecturer on computer science related subjects in several degrees as Mathematics and Computer Science, as programming fundamentals or project management and planning.

In music, MIR on the technical side in several projects related to dodecaphonism automatic composition and classical composer detection from audio; live jazz, wind-orchestra and rock music on the practical. Some bleep-bloop chiptunes too.

When I have some spare time, I calibrate my 3D printer and design useful models. Also PCBs.


Features are not everything in software development. Always staying up to date on code style, designing useful APIs and intuitive interfaces.


Coding in a variety of languages and paradigms for modern times! From local companies websites to wide HPCC deployments.


Constantly developing with and for humans. Working with the right colaborative tools in SCRUM/agile-like groups makes the daily routine easier!

Featured Projects



MaCoCu focuses on collecting monolingual and parallel data from the Internet, specially for under-resourced languages and DSI-specific data.

Check it out


Game Boy -related custom hardware source files, easily reproducible with services like OSH Park or PCB Way

Check it out


Bitextor generates translation memories from multilingual websites or WARC files. A complete pipeline ready to be used in production distributed environments.

Check it out


Crawling thousands of websites, added to the Internet Archive data and processing all efficiently with open-source software to create a huge, powerful and heterogeneus parallel corpus for Machine Translation systems.

Check it out

Reverso Context

The most advanced and fast parallel corpora search tool, finding aligned documents and sentences from many public resources, with several million page-views every day and deeply integrated into Reverso ecosystem.

Check it out


If you want to know more about me or my work, take a look at my CV:

Vita 📃