brandonrose.org
Document Clustering with Python
http://brandonrose.org/clustering
Document Clustering with Python. In this guide, I will explain how to cluster a set of documents using Python. My motivating example is to identify the latent structures within the synopses of the top 100 films of all time (per an IMDB list). See the original post. For a more detailed discussion on the example. This guide covers:. Tokenizing and stemming each synopsis. Transforming the corpus into vector space using tf-idf. Calculating cosine distance between each document as a measure of similarity.
zamojski.net
Marcin Zamojski - Other projects
http://zamojski.net/other.php
Other projects and code snippets. Violin/box plots in D3.js. Implementation of violin/box plots in D3.js. The familiar, glanceable format of box plots meets the robustness of kernel estimation. An extension that allows using Bibtex citations within jupyter. Timing functions in Julia. Integration of D3.js. Designed for rendering static versions (through PhantomJS) of visualizations. Note that for dynamic charts it is probably better to use some other package, e.g. mpld3. 46 317 86 12 84.
frederikdurant.com
Selling a new banking product to the right customers (first) – Frederik Durant's .data blog
http://frederikdurant.com/projects/banking-project
Frederik Durant's .data blog. On a data science sabbatical in NYC. Bull; target marketing. Bull; customer ranking. Bull; logistic regression. Selling a new banking product to the right customers (first). February 20, 2015. Project: Target Marketing for a Bank. During week 4, 5 and 6, the Metis Data Science Bootcamp zoomed in on the following technologies:. SQL and mySQL in the cloud. Supervised learning with scikit-learn. Interactive visualization with matplotlib. Interactive widgets in iPython. ROC curv...
onedatapoint.com
Clustering for Computational Chemistry | One Data Point
http://onedatapoint.com/index.php/2015/04/27/clustering-computational-chemistry
A Statistically Insignificant Sampling. Clustering for Computational Chemistry. April 27, 2015. June 2, 2015. For the past five years, I’ve worked as a theoretical chemist in the Head-Gordon group. We care about assigning chemical reactions to groups because we have heuristic ideas about how our methods should perform on these groups. Ultimately we want to have classes of test data with well characterized errors for subsets of methods. By using unsupervised learning, we can remove the appeal to g...Clust...
onedatapoint.com
David Stück | One Data Point
http://onedatapoint.com/index.php/author/admin
A Statistically Insignificant Sampling. Emoji Translator: Early Update. August 2, 2015. August 5, 2015. I’ve got a little bit of an update to bring the site up to speed with the progress I’ve made on the project. Now that I’ve finished up the first draft of my thesis! The main pipeline of the project is scraping data from Twitter, cleaning the data, training a Word2Vec model, and developing models to translate to emojis. In terms of other meta data, I only consider English tweets with 5 or more words and...
cyrille.rossant.net
Cyrille Rossant - Scientific Python in the Browser: it's coming!
http://cyrille.rossant.net/scientific-python-in-the-browser-its-coming
Scientific Python in the Browser: it's coming! March 31, 2014. There is currently a manifest trend in the scientific Python ecosystem: Python is slowly but surely coming to the browser. It's a real challenge, but we're getting there. In this post, I want to give an overview of where we are, and where we're headed. Why it's a good thing. Python is becoming one of the most popular open source platforms for scientific computing and data analysis. On the other hand, the Web is today the platform of choice.
ioam.github.io
Introduction — HoloViews
http://ioam.github.io/holoviews
July 1st 2015: HoloViews 1.3 released and now available on PyPI. May 14th 2015: Talk. On HoloViews to appear at the SciPy 2015 conference. Mar 19th 2015: HoloViews now available on SageMathCloud. Feb 25th 2015: HoloViews announced as the winner in its category at the UK Open Source Awards 2015. Composable, declarative data structures for building even complex visualizations easily. HoloViews is a Python. More detailed example ¶. Illustrating how even a simple annotation can be used to reflect other data ...
cyrille.rossant.net
Cyrille Rossant - Big Data visualization with WebGL, part 1: Overview
http://cyrille.rossant.net/big-data-visualization-webgl-part1
Big Data visualization with WebGL, part 1: Overview. October 15, 2014. In this post series, I'll talk about the big data visualization platform I'm currently developing with WebGL. I'll give in this first post the main motivations for this project. The next posts will contain the technical details. This project brings together several modern trends in data science and computing:. The modern Web platform. Or the BRAIN Initiative. In more and more disciplines, datasets are becoming too big for our computer...
onedatapoint.com
Data Science | One Data Point
http://onedatapoint.com/index.php/category/datascience
A Statistically Insignificant Sampling. Emoji Translator: Early Update. August 2, 2015. August 5, 2015. I’ve got a little bit of an update to bring the site up to speed with the progress I’ve made on the project. Now that I’ve finished up the first draft of my thesis! The main pipeline of the project is scraping data from Twitter, cleaning the data, training a Word2Vec model, and developing models to translate to emojis. In terms of other meta data, I only consider English tweets with 5 or more words and...