blog.samibadawi.com
Languages and Logic: July 2011
http://blog.samibadawi.com/2011_07_01_archive.html
Natural language processing, machine learning and computer vision. Saturday, July 23, 2011. Scala, Eclipse and Maven integration tutorial. I have evaluated Scala as a language for cloud computing and Hadoop. One requirement was a robust development environment, with a real build system, a good IDE with code completion and debugging. The combination of Scala. Seemed like a fit for this requirement, but my initial experience was mixed. Problems with Scala, Eclipse and Maven integration. This will create th...
sparkdeveloper.com
Spark | Sparklandia
http://sparkdeveloper.com/category/spark
Machine Learning, ODPi, Deduping with Scala, OCR. May 19, 2016. ODPi for Hadoop Standards. The ODPi ASF to consolidate Hadoop and all the versions. Too many custom distributions with various versions of the 20 or so tools that make up Apache Big Data. To be able to move between HDP, CDH, IBM, Pivotal and MapR seemless would be awesome. For now HDP, Pivotal and IBM are part of the ODPi. It let’s you read HDFS/Hive tables from EDB with SQL. (. Using Apache NIFI with Tesseract for OCR. Data in Motion: Strea...
clojurelx.blogspot.com
Clojure & lx: Why Clojure lx?
http://clojurelx.blogspot.com/2011/11/why-clojure-lx.html
Wednesday, November 16, 2011. The NLTK is a natural choice for students of linguistics and computer science. It has matured into a stable project, its users are very active, and it is now used outside of academia. Those who are into functional programming can use the Scheme Natural Language Toolkit. Or learn from the Natural Language Processing for the Working Programmer. And those who needs the JVM can turn to ScalaNLP. So why brother with Clojure? Well, there is no clear boundary! Among others). Bu...
yuzhouwan.com
Real-time ML with Spark | yuzhouwan
http://yuzhouwan.com/2015/08/13/Real-time-ML-with-Spark
Http:/ www.yuzhouwan.com. Real-time ML with Spark. Apache Spark is a fast and general engine for large-scale data processing. - - Official website. 提供 一个可以被并行计算的 不变、分区的数据集 抽象. 依赖分为两种,一个为窄依赖,如 map/filter/union 等 另一种为宽依赖,如 groupByKey 等. 和 cogroup 有相同的 数据结构,将会确定一个 OneToOneDependency. 反之,则说明 join 的时候,需要 shuffle ( ShuffleDependency. 建议]: "wide dependencies 只有等到所有 父 partiton 计算完,并传递结束,才能继续进行下一步运算,所以应尽量减少宽依赖,避免失败后 recompute 的成本". Lineage 血统,能在计算失败的时候,将会找寻 最小重新计算损耗的 结点,而不是全部重复计算. 将数据流 按 时间间隔 Duration. Num = rand...
lyh.me
Scala Workshop
http://www.lyh.me/slides/workshop.html
Def quickSort(a: List[Double]): List[Double] = a match { case Nil = Nil case x : xs = val (lt, gt) = xs.partition( x) quickSort(lt) List(x) quickSort(gt) }. Pattern matching →. Powerful collections →. Anonymous functions, a.k.a. λ →. Tuple decomposition →. Val (lt, gt) = . Statically typed type inference. Concise, expressive and flexible syntax. Great for modelling data flow. The Not So Good Parts. Complexity - traits, implicits, advanced types, . Slow compilation and crazy stack trace. Scala in Big Data.
sparkdeveloper.com
Machine Learning, ODPi, Deduping with Scala, OCR | Sparklandia
http://sparkdeveloper.com/2016/05/19/machine-learning-odpi-deduping-with-scala-ocr
Machine Learning, ODPi, Deduping with Scala, OCR. May 19, 2016. ODPi for Hadoop Standards. The ODPi ASF to consolidate Hadoop and all the versions. Too many custom distributions with various versions of the 20 or so tools that make up Apache Big Data. To be able to move between HDP, CDH, IBM, Pivotal and MapR seemless would be awesome. For now HDP, Pivotal and IBM are part of the ODPi. It let’s you read HDFS/Hive tables from EDB with SQL. (. Using Apache NIFI with Tesseract for OCR. Data in Motion: Strea...
lmf-ramblings.blogspot.com
My Corner of the World: February 2012
http://lmf-ramblings.blogspot.com/2012_02_01_archive.html
My Corner of the World. Observations, solutions to problems, and even a few rants about Linux, free software, and computing in general. Wednesday, February 29, 2012. Clojure in pure Python is a great idea. Edit: Here's the link. Not good enough for me to do it myself, but I'd be very happy if someone else did it. Why? Another big advantage is getting us away from Java. The build tools, the classpath, the .com.lengthy.verbose.wtf.on.and.on are inherited from Java, and serio...Why am I writing this here?
opus.lingfil.uu.se
DataProcessingTools – LetsMT
http://opus.lingfil.uu.se/letsmt-trac/wiki/DataProcessingTools
Alignment and (S)MT tools. Other MT and Alignment tools. Corpus management and interfaces. Building parallel corpora: InterText? Http:/ wanthalf.saga.cz/intertext. Http:/ www.textforge.cz/. The corpus work bench:. Http:/ cwb.sourceforge.net/. Example for parallel corpus search:. Http:/ cogsci.uni-osnabrueck.de/ korpora/ws/CQPdemo/Europarl/frames-cqp.html. Http:/ nlp.fi.muni.cz/trac/noske/. Http:/ www.tradooit.com/. MkAlign: explore parallel corpora (. Http:/ www.tal.univ-paris3.fr/mkAlign/.
demand-side-science.jp
DSS Tech Blog - Demand Side Science ㈱ の技術ブログ
http://demand-side-science.jp/blog
Demand Side Science, inc. Demand Side Science の技術ブログ. Electron Scala.js チーム. 私渋谷が 犬猫判定アプリ を( 公開しました. Spark GraphX Sonification チーム. ソースコード管理: GitLab = GitHub プライベートリポジトリ. ドキュメント管理: GitLab = (Sphinx) = GitHub docs専用リポジトリ wiki. タスク管理: Redmine = GitHub Issue Waffle.io. CI: Jenkins = CircleCI. デプロイ: (Capistrano) = Fabric = Ansible sbt-art. DWH: Hadoop(HBase/Hive) = Amazon Redshift. 単体テスト: Specs2 = ScalaTest ScalaCheck. チャット: IP Messenger = IRC = HipChat ( Slack) Hubot. こうして見ると、オンプレミスから着実に AWS SaaS 構成に移行しています。
SOCIAL ENGAGEMENT