Big Data Memo

天行健，君子以自强不息；地势坤，君子以厚德载物。

远程访问jupyter notebook

登陆远程服务器生成配置文件 jupyter notebook --generate-config 生成密码打开ipython，创建一个密文的密码： In [1]: from notebook.auth import passwd In [2]: passwd() Enter password: Verify password: Out[2]: 'sha1:c...

Posted by Big Data Memo on February 10, 2017

Zipline量化平台

什么是量化策略？ *一个完整的策略需要包含输入、策略处理逻辑、输出；策略处理逻辑需要考虑选股、择时、仓位管理和止盈止损等因素。 <–https://uqer.io/community/share/567ba9c3228e5b344568822c–> <–https://uqer.io/community/share/56749fca228e5bab38c977d1...

Posted by Big Data Memo on February 7, 2017

yh模型开发技术选型

Posted by Big Data Memo on January 20, 2017

量化平台ABC

zipline + pyfolio 搭建的量化策略研究平台，解决了策略回测和评价的大部分问题，为quantopian的开源点赞！国内量化平台作者：匿名用户链接：https://www.zhihu.com/question/35097533/answer/91017820 来源：知乎著作权归作者所有，转载请联系作者获得授权。这个话题我喜欢，摆事实讲道理是我强项。待我放下了手中的...

Posted by Big Data Memo on January 18, 2017

How to use SparkSession in Apache Spark 2.0

pre This blog is from here. https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html Generally, a session is an interaction between two or more entities. In compute...

Posted by Big Data Memo on December 29, 2016

新三板文本项目

目录 1 文本数据采集 2 文本数据解析 3 文本数据分析 4 文本数据词云 5 文本数据分类与其它 1 文本数据采集采集数据源：三板市场研报采集样本数：全部采集，采集截至当前的所有三板研报采集内容：序号日期报告类型标题机构作者研报内容采集方法：爬虫采集，使用scrapy框架采集存储格式：JSON格式或其它...

Posted by Big Data Memo on December 26, 2016

新三板聚类项目

目录 1 聚类类别确定demo 2 聚类demo 3 spark MySQL 读取数据 4 其它 1 聚类类别确定demo 1.1 K-means 聚类算法原理聚类分析是一个无监督学习 (Unsupervised Learning) 过程, 一般是用来对数据对象按照其特征属性进行分组，经常被应用在客户分群，欺诈检测，图像分析等领域。K-means 应该是最...

Posted by Big Data Memo on December 26, 2016

wordcloud

Refer Word Clouds and Language Models: Visualizing Text I’m going to do this demonstration in two stages: First we will build a word cloud using TF-IDF weights on a relatively small corpus of do...

Posted by Big Data Memo on December 25, 2016

专家盘点生二胎的四点好处和三点坏处

China Ends One-Child Policy, Allowing Families Two Children 1 开场 As China ends its one child policy, some parents ponder the pros and cons of having a second child. 中国结束独生子女政策，一些家长思考生...

Posted by Big Data Memo on December 16, 2016

Spark-ML-0302-Naive Bayes

朴素贝叶斯 1 介绍朴素贝叶斯是一种构建分类器的简单方法。该分类器模型会给问题实例分配用特征值表示的类标签，类标签取自有限集合。它不是训练这种分类器的单一算法，而是一系列基于相同原理的算法：所有朴素贝叶斯分类器都假定样本每个特征与其他特征都不相关。举个例子，如果一种水果其具有红，圆，直径大概3英寸等特征，该水果可以被判定为是苹果。尽管这些特征相互依赖或者有些特征由其他特征决定，然...

Posted by Big Data Memo on December 5, 2016