Python

My first Django app

This is my implementaion on the Django tutorial.

Hacking the Data Transformation Interview

I am currently (still) seeking a job in data/software engineering area, and I am preparing for all kinds of technical interviews, ranging from coding, algorithm, system design, SQL to computer science fundamental quiz. Data engineer is a role with vague definition, and people with this title functions as an ETL (extract, transformation, load) engineer in some companies. Thus, topics on data transformation could be covered during the interview. In this blog, I am trying to hack interview focusing on data tranformation.

Plain text accounting tool Beancount in Emacs

昨天搞定 Emacs 的中文环境,今天想着好久没记帐了,随手记总是很花时间,干脆一鼓作气,将记帐一起搬过来。 复式记帐 我本科时双修了一门金融,其中最为繁琐,也是最令人头疼的一门课便是会计学。我不爱听讲,毕业之后也将会计的知识点忘得查差不多了,然而其”有借必有贷,借贷必相等”的复式记帐法,反而是我生活中运用最多的。 复式记帐法说起来复杂,对个人来说,就是将每笔交易分门别类,填入下面这条等式中: .org-center { margin-left: auto; margin-right: auto; text-align: center; } 资产 + 费用 = 负债 + 所有者权益 + 收入 在个人记帐中,这条等式辅以借贷概念会令人望而生畏,所以我将其略一变形: .org-center { margin-left: auto; margin-right: auto; text-align: center; } (+资产)+(-负债)+(+费用)+(-收入)+(-所有者权益)= 0 资产与负债都是与金钱直接挂钩的,比如银行户头,比如人民币美钞,比如信用卡帐户,再比如借条,这些都可以算是货币的不同表示形式。 费用与收入则是与金钱间接挂钩的,比如工资,比如商品,比如服务,这些都可以算是生产资本与商品资本的不同表现形式。其中收入与字面意思不同,可以理解为生产资本(劳动力等)。 所有者权益对个人意义不大,一般用来记录历史盈余与糊涂帐。 而这个变形公式中的正负号,则可以将各项目的符号关系,用作资金流向的参照。这么说有点玄乎,换个茨威格式的说法就是:”人们从命运得到的一切,冥冥之中都记下了它的价钱。”花钱买了服务,则是减少资产(花了钱),增加费用(得到服务);工资到帐,则是增加资产(到帐),减少收入(既有劳动时间的减少);别人跟你借了钱,则是资产减少(银行帐户或现金余额减少),负债增加(别人在你这儿的负债)。 总结一下,由于人类暂时无法操控时间,所以费用一般为正,收入一般为负。而没有破产的情况下资产一般为正,别人欠你的钱为正,信用卡债(你欠别人的钱)为负。

CSV2Bean

• Provide smooth transition from accounting app to plain text accounting tools. • Convert .csv file exported from Sui accounting app to a Beancount text ledger. • Feature a lisp function to quickly add transactions into Beancount using Emacs.

Gene Variation Analysis of Stomach Cancer

• Extracted cancer research data from the Cancer Genome Atlas Network ®. • Applied GISTIC clustering analysis to patient-indexed tables to identify molecular subgroups (Bioinformatics). • Mapped gene list with biological annotations with functional annotation tool DAVID (Python). • Summarized and visualized stomach cancer-related gene candidates with R package MAFtools ®.

Multithread Web Crawler

• Write a class to handle multithreading website crawling inside the given domain. • Feature a breath-first search algorithm and a multithread pool to visit all urls asynchronously. • Handle various status code, time-out and exceptions in a structured manner.

RxMiner

• Developed an ETL pipeline to extract, integrate and transform prescription data from multiple providers in AWS cloud computing to enable nation-wide queries on prescription drug usage (Python, AWS, Airflow). • Validated and combined public available Medicaid and Medicare datasets with NIH, FDA and NPPES sources into a SQL queryable databases in Redshift, visualized in website (Redshift, Tableau, JavaScript, CSS). • Implemented custom connector to Redshift/PostgreSQL with 20 times more efficiency (Python).