background preloader

How to Write a Spelling Corrector

How to Write a Spelling Corrector

Natural language processing tutorial - Vik's Blog Introduction This will serve as an introduction to natural language processing. I adapted it from slides for a recent talk at Boston Python. We will go from tokenization to feature extraction to creating a model using a machine learning algorithm. The goal is to provide a reasonable baseline on top of which more complex natural language processing can be done, and provide a good introduction to the material. The examples in this code are done in R, but are easily translatable to other languages. Training set example Let's say that I wanted to give a survey today and ask the following question: Why do you want to learn about machine learning? The responses might look like this: ## 1 I like solving interesting problems. ## 2 What is machine learning? Let's say that the survey also asks people to rate their interest on a scale of 0 to 2. We would now have text and associated scores: First steps What is the algorithm doing? Tokenization Let's tokenize the first survey response: Bag of words model

Python Visualization Libraries List ggplot ggplot is a plotting system for Python based on R's ggplot2 and the Grammar of Graphics. It is built for making profressional looking, plots quickly with minimal code. Seaborn Seaborn is a library for making attractive and informative statistical graphics in Python. Seaborn offers: matplotlib matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Bokeh Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. pygal pygal is a dynamic SVG charting library. python-igraph igraph is a collection of network analysis tools with the emphasis on efficiency, portability and ease of use. python-igraph is a python interface to the igraph. graph plotting functionality is provided by the Cairo library This is a part of community edited list at Pansop DSC Resources Additional Reading

delivery.acm.org/10.1145/1880000/1870771/p1162-dinu.pdf?ip=88.117.244.122&acc=OPEN&key=1B55DF923F77674F55057ED4F3766CA0&CFID=334150164&CFTOKEN=85917798&__acm__=1370126273_c43f507c685c5242b4d75f461f31a99d natural language processing blog A guide to analyzing Python performance « Huy Nguyen While it’s not always the case that every Python program you write will require a rigorous performance analysis, it is reassuring to know that there are a wide variety of tools in Python’s ecosystem that one can turn to when the time arises. Analyzing a program’s performance boils down to answering 4 basic questions: How fast is it running? Where are the speed bottlenecks? Below, we’ll dive into the details of answering these questions using some awesome tools. Coarse grain timing with time Let’s begin by using a quick and dirty method of timing our code: the good old unix utility time. $ time python yourprogram.py real 0m1.028s user 0m0.001s sys 0m0.003s The meaning between the three output measurements are detailed in this stackoverflow article, but in short real - refers to the actual elasped time user - refers to the amount of cpu time spent outside of kernel sys - refers to the amount of cpu time spent inside kernel specific functions Fine grain timing with a timing context manager

cryptanalysis - Recommended skills for a job in cryptology - Cryptography Stack Exchange It seems we have aligned interests. I'm also a university student (although I am a math/comp sci double major) looking to pursue a career in cryptography. To that end, I have been self-studying it for a while now. So, take what I say with a grain of salt. From what I can best tell, the requisite knowledge of computer science is entirely dependent on what you want to do with cryptography exactly. The reasons for this are many-fold. Further, you will need a strong knowledge of C, which sits so close to the hardware that it isn't too far a step away anyways. To that end, I would really recommend you pick up a minor in computer science if you are going to work with applied cryptography at all. Of course, to get very far in crypto, you will have to have a strong understanding of mathematics. The HAC lists probability theory, information theory, complexity theory, number theory, and abstract algebra as being introductory background material, and the rabbit hole just goes deeper and deeper.

Machine Learning With Python In this post, I'd like to share a few awesome machine-learning toolkits developed in python. Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible. PyML – machine learning in Python PyML is an interactive object oriented framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods. mlpy mlpy provides a wide range of state-of-the-art machine learning methods for supervisedand unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is Open Source, distributed under the GNU General Public License version 3. scikit-learn: machine learning in Python The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing. Theano SluggerML- baseball stats!

Handbook of Applied Cryptography Alfred J. Menezes, CRC Press ISBN: 0-8493-8523-7 October 1996, 816 pages Fifth Printing (August 2001) The Handbook was reprinted (5th printing) in August 2001. The publisher made all the various minor changes and updates we submitted. You can identify the 5th printing of the book by looking for "5 6 7 8 9 0" at the bottom of the page that includes the ISBN number.

7 Python Libraries you should know about , Reposted here. In my years of programming in Python and roaming around GitHub's Explore section, I've come across a few libraries that stood out to me as being particularly enjoyable to use. This blog post is an effort to further spread that knowledge. I specifically excluded awesome libs like requests, SQLAlchemy, Flask, fabricetc. because I think they're already pretty "main-stream". 1. pyquery (with lxml) pip install pyquery For parsing HTML in Python, Beautiful Soup is oft recommended and it does a great job. What immediately stands out is how fast lxml is. So either slow and easy to use or fast and hard to use, right? Wrong! Enter PyQuery Oh PyQuery you beautiful seductress: from pyquery import PyQuerypage = PyQuery(some_html)last_red_anchor = page('#container > a.red:last') There are some gotchas, like for example that PyQuery, like jQuery, exposes its internals upon iteration, forcing you to re-wrap: 2. dateutil pip install python-dateutil Handling dates is a pain. 3. fuzzywuzzy 5. sh Done!

Exascale Challenges The emerging exascale computing architecture will not be simply 1000 x today’s petascale architecture. All proposed exascale computer systems designs will share some of the following challenges: Processor architecture is still unknown. These challenges represent a change in the computing cost model, from expensive flops coupled with almost free data movement, to free flops coupled with expensive data movement.

language agnostic - What are the lesser known but useful data structures language agnostic - Learning to Write a Compiler

Related: