In R (dank magritrr
) können Sie jetzt Operationen mit einer funktionaleren Piping-Syntax über ausführen %>%
. Dies bedeutet, anstatt dies zu codieren:
> as.Date("2014-01-01")
> as.character((sqrt(12)^2)
Sie können dies auch tun:
> "2014-01-01" %>% as.Date
> 12 %>% sqrt %>% .^2 %>% as.character
To me this is more readable and this extends to use cases beyond the dataframe. Does the python language have support for something similar?
python
functional-programming
pipeline
cantdutchthis
quelle
quelle
crime_by_state %>% filter(State=="New York", Year==2005) ...
from the end of How dplyr replaced my most common R idioms.Antworten:
One possible way of doing this is by using a module called
macropy
. Macropy allows you to apply transformations to the code that you have written. Thusa | b
can be transformed tob(a)
. This has a number of advantages and disadvantages.In comparison to the solution mentioned by Sylvain Leroux, The main advantage is that you do not need to create infix objects for the functions you are interested in using -- just mark the areas of code that you intend to use the transformation. Secondly, since the transformation is applied at compile time, rather than runtime, the transformed code suffers no overhead during runtime -- all the work is done when the byte code is first produced from the source code.
The main disadvantages are that macropy requires a certain way to be activated for it to work (mentioned later). In contrast to a faster runtime, the parsing of the source code is more computationally complex and so the program will take longer to start. Finally, it adds a syntactic style that means programmers who are not familiar with macropy may find your code harder to understand.
Example Code:
run.py
import macropy.activate # Activates macropy, modules using macropy cannot be imported before this statement # in the program. import target # import the module using macropy
target.py
from fpipe import macros, fpipe from macropy.quick_lambda import macros, f # The `from module import macros, ...` must be used for macropy to know which # macros it should apply to your code. # Here two macros have been imported `fpipe`, which does what you want # and `f` which provides a quicker way to write lambdas. from math import sqrt # Using the fpipe macro in a single expression. # The code between the square braces is interpreted as - str(sqrt(12)) print fpipe[12 | sqrt | str] # prints 3.46410161514 # using a decorator # All code within the function is examined for `x | y` constructs. x = 1 # global variable @fpipe def sum_range_then_square(): "expected value (1 + 2 + 3)**2 -> 36" y = 4 # local variable return range(x, y) | sum | f[_**2] # `f[_**2]` is macropy syntax for -- `lambda x: x**2`, which would also work here print sum_range_then_square() # prints 36 # using a with block. # same as a decorator, but for limited blocks. with fpipe: print range(4) | sum # prints 6 print 'a b c' | f[_.split()] # prints ['a', 'b', 'c']
And finally the module that does the hard work. I've called it fpipe for functional pipe as its emulating shell syntax for passing output from one process to another.
fpipe.py
from macropy.core.macros import * from macropy.core.quotes import macros, q, ast macros = Macros() @macros.decorator @macros.block @macros.expr def fpipe(tree, **kw): @Walker def pipe_search(tree, stop, **kw): """Search code for bitwise or operators and transform `a | b` to `b(a)`.""" if isinstance(tree, BinOp) and isinstance(tree.op, BitOr): operand = tree.left function = tree.right newtree = q[ast[function](ast[operand])] return newtree return pipe_search.recurse(tree)
quelle
Pipes are a new feature in Pandas 0.16.2.
Example:
import pandas as pd from sklearn.datasets import load_iris x = load_iris() x = pd.DataFrame(x.data, columns=x.feature_names) def remove_units(df): df.columns = pd.Index(map(lambda x: x.replace(" (cm)", ""), df.columns)) return df def length_times_width(df): df['sepal length*width'] = df['sepal length'] * df['sepal width'] df['petal length*width'] = df['petal length'] * df['petal width'] x.pipe(remove_units).pipe(length_times_width) x
NB: The Pandas version retains Python's reference semantics. That's why
length_times_width
doesn't need a return value; it modifiesx
in place.quelle
PyToolz [doc] allows arbitrarily composable pipes, just they aren't defined with that pipe-operator syntax.
Follow the above link for the quickstart. And here's a video tutorial: http://pyvideo.org/video/2858/functional-programming-in-python-with-pytoolz
In [1]: from toolz import pipe In [2]: from math import sqrt In [3]: pipe(12, sqrt, str) Out[3]: '3.4641016151377544'
quelle
"more functional piping syntax" is this really a more "functional" syntax ? I would say it adds an "infix" syntax to R instead.
That being said, the Python's grammar does not have direct support for infix notation beyond the standard operators.
If you really need something like that, you should take that code from Tomer Filiba as a starting point to implement your own infix notation:
quelle
If you just want this for personal scripting, you might want to consider using Coconut instead of Python.
Coconut is a superset of Python. You could therefore use Coconut's pipe operator
|>
, while completely ignoring the rest of the Coconut language.For example:
def addone(x): x + 1 3 |> addone
compiles to
# lots of auto-generated header junk # Compiled Coconut: ----------------------------------------------------------- def addone(x): return x + 1 (addone)(3)
quelle
print(1 |> isinstance(int))
... TypeError: isinstance expected 2 arguments, got 1print(1 |> isinstance$(int))
, or preferably,1 |> isinstance$(int) |> print
.1 |> print$(2)
callsprint(2, 1)
since $ maps to Python partials. but I wantprint(1, 2)
which matches UFCS and magrittr. Motivation:1 |> add(2) |> divide(6)
should be 0.5, and I should not need parentheses.1 |> isinstance$(?, int) |> print
. For your other examples:1 |> print$(?, 2)
,1 |> (+)$(?, 2) |> (/)$(?, 6)
. I don't think you can avoid parentheses for partial application.|>
and(+)$(?, 2)
is, I've come to the conclusion that the programming-language and math establishment does not want me to use this type of syntax, and makes it even uglier than resorting to a set of parentheses. I would use it if it had better syntax (eg. Dlang has UFCS but IDK about arithmetic functions, or if Python had a..
pipe operator).There is
dfply
module. You can find more information athttps://github.com/kieferk/dfply
Some examples are:
from dfply import * diamonds >> group_by('cut') >> row_slice(5) diamonds >> distinct(X.color) diamonds >> filter_by(X.cut == 'Ideal', X.color == 'E', X.table < 55, X.price < 500) diamonds >> mutate(x_plus_y=X.x + X.y, y_div_z=(X.y / X.z)) >> select(columns_from('x')) >> head(3)
quelle
dfply
anddplython
are the same packages. Is there any difference between them? @BigDataScientistdfply
,dplython
,plydata
packages are python ports of thedplyr
package so they are going to be pretty similar in syntax.I missed the
|>
pipe operator from Elixir so I created a simple function decorator (~ 50 lines of code) that reinterprets the>>
Python right shift operator as a very Elixir-like pipe at compile time using the ast library and compile/exec:from pipeop import pipes def add3(a, b, c): return a + b + c def times(a, b): return a * b @pipes def calc() print 1 >> add3(2, 3) >> times(4) # prints 24
All it's doing is rewriting
a >> b(...)
asb(a, ...)
.https://pypi.org/project/pipeop/
https://github.com/robinhilliard/pipes
quelle
You can use sspipe library. It exposes two objects
p
andpx
. Similar tox %>% f(y,z)
, you can writex | p(f, y, z)
and similar tox %>% .^2
you can writex | px**2
.from sspipe import p, px from math import sqrt 12 | p(sqrt) | px ** 2 | p(str)
quelle
Building
pipe
withInfix
As hinted at by Sylvain Leroux, we can use the
Infix
operator to construct a infixpipe
. Let's see how this is accomplished.First, here is the code from Tomer Filiba
The pipe operator passes the preceding object as an argument to the object that follows the pipe, so
x %>% f
can be transformed intof(x)
. Consequently, thepipe
operator can be defined usingInfix
as follows:In [1]: @Infix ...: def pipe(x, f): ...: return f(x) ...: ...: In [2]: from math import sqrt In [3]: 12 |pipe| sqrt |pipe| str Out[3]: '3.4641016151377544'
A note on partial application
The
%>%
operator fromdpylr
pushes arguments through the first argument in a function, sodf %>% filter(x >= 2) %>% mutate(y = 2*x)
corresponds to
df1 <- filter(df, x >= 2) df2 <- mutate(df1, y = 2*x)
The easiest way to achieve something similar in Python is to use currying. The
toolz
library provides acurry
decorator function that makes constructing curried functions easy.In [2]: from toolz import curry In [3]: from datetime import datetime In [4]: @curry def asDate(format, date_string): return datetime.strptime(date_string, format) ...: ...: In [5]: "2014-01-01" |pipe| asDate("%Y-%m-%d") Out[5]: datetime.datetime(2014, 1, 1, 0, 0)
Notice that
|pipe|
pushes the arguments into the last argument position, that isx |pipe| f(2)
corresponds to
f(2, x)
When designing curried functions, static arguments (i.e. arguments that might be used for many examples) should be placed earlier in the parameter list.
Note that
toolz
includes many pre-curried functions, including various functions from theoperator
module.In [11]: from toolz.curried import map In [12]: from toolz.curried.operator import add In [13]: range(5) |pipe| map(add(2)) |pipe| list Out[13]: [2, 3, 4, 5, 6]
which roughly corresponds to the following in R
> library(dplyr) > add2 <- function(x) {x + 2} > 0:4 %>% sapply(add2) [1] 2 3 4 5 6
Using other infix delimiters
You can change the symbols that surround the Infix invocation by overriding other Python operator methods. For example, switching
__or__
and__ror__
to__mod__
and__rmod__
will change the|
operator to themod
operator.In [5]: 12 %pipe% sqrt %pipe% str Out[5]: '3.4641016151377544'
quelle
Adding my 2c. I personally use package fn for functional style programming. Your example translates into
from fn import F, _ from math import sqrt (F(sqrt) >> _**2 >> str)(12)
F
is a wrapper class with functional-style syntactic sugar for partial application and composition._
is a Scala-style constructor for anonymous functions (similar to Python'slambda
); it represents a variable, hence you can combine several_
objects in one expression to get a function with more arguments (e.g._ + _
is equivalent tolambda a, b: a + b
).F(sqrt) >> _**2 >> str
results in aCallable
object that can be used as many times as you want.quelle
_
is not 100% flexible: it doesn't not support all Python operators. Additionaly, if you plan on using_
in an interactive session, you should import it under another name (e.g.from fn import _ as var
), because most (if not all) interactive Python shells use_
to represent the last unassigned returned value, thus shadowing the imported object.There is no need for 3rd party libraries or confusing operator trickery to implement a pipe function - you can get the basics going quite easily yourself.
Lets start by defining what a pipe function actually is. At its heart, it is just a way to express a series of function calls in logical order, rather than the standard 'inside out' order.
For example, lets look at these functions:
def one(value): return value def two(value): return 2*value def three(value): return 3*value
Not very interesting, but assume interesting things are happening to
value
. We want to call them in order, passing the output of each to the next. In vanilla python that would be:result = three(two(one(1)))
It is not incredibly readable and for more complex pipelines its gonna get worse. So, here is a simple pipe function which takes an initial argument, and the series of functions to apply it to:
def pipe(first, *args): for fn in args: first = fn(first) return first
Lets call it:
result = pipe(1, one, two, three)
That looks like very readable 'pipe' syntax to me :). I don't see how it is any less readable than overloading operators or anything like that. In fact, I would argue that it is more readable python code
Here is the humble pipe solving the OP's examples:
from math import sqrt from datetime import datetime def as_date(s): return datetime.strptime(s, '%Y-%m-%d') def as_character(value): # Do whatever as.character does return value pipe("2014-01-01", as_date) pipe(12, sqrt, lambda x: x**2, as_character)
quelle
One alternative solution would be to use the workflow tool dask. Though it's not as syntactically fun as...
...it still allows your variable to flow down the chain and using dask gives the added benefit of parallelization where possible.
Here's how I use dask to accomplish a pipe-chain pattern:
import dask def a(foo): return foo + 1 def b(foo): return foo / 2 def c(foo,bar): return foo + bar # pattern = 'name_of_behavior': (method_to_call, variables_to_pass_in, variables_can_be_task_names) workflow = {'a_task':(a,1), 'b_task':(b,'a_task',), 'c_task':(c,99,'b_task'),} #dask.visualize(workflow) #visualization available. dask.get(workflow,'c_task') # returns 100
After having worked with elixir I wanted to use the piping pattern in Python. This isn't exactly the same pattern, but it's similar and like I said, comes with added benefits of parallelization; if you tell dask to get a task in your workflow which isn't dependant upon others to run first, they'll run in parallel.
If you wanted easier syntax you could wrap it in something that would take care of the naming of the tasks for you. Of course in this situation you'd need all functions to take the pipe as the first argument, and you'd lose any benefit of parallization. But if you're ok with that you could do something like this:
def dask_pipe(initial_var, functions_args): ''' call the dask_pipe with an init_var, and a list of functions workflow, last_task = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]}) workflow, last_task = dask_pipe(initial_var, [function_1, function_2]) dask.get(workflow, last_task) ''' workflow = {} if isinstance(functions_args, list): for ix, function in enumerate(functions_args): if ix == 0: workflow['task_' + str(ix)] = (function, initial_var) else: workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1)) return workflow, 'task_' + str(ix) elif isinstance(functions_args, dict): for ix, (function, args) in enumerate(functions_args.items()): if ix == 0: workflow['task_' + str(ix)] = (function, initial_var) else: workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1), *args ) return workflow, 'task_' + str(ix) # piped functions def foo(df): return df[['a','b']] def bar(df, s1, s2): return df.columns.tolist() + [s1, s2] def baz(df): return df.columns.tolist() # setup import dask import pandas as pd df = pd.DataFrame({'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]})
Now, with this wrapper, you can make a pipe following either of these syntactical patterns:
# wf, lt = dask_pipe(initial_var, [function_1, function_2]) # wf, lt = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})
like this:
# test 1 - lists for functions only: workflow, last_task = dask_pipe(df, [foo, baz]) print(dask.get(workflow, last_task)) # returns ['a','b'] # test 2 - dictionary for args: workflow, last_task = dask_pipe(df, {foo:[], bar:['string1', 'string2']}) print(dask.get(workflow, last_task)) # returns ['a','b','string1','string2']
quelle
There is very nice
pipe
module here https://pypi.org/project/pipe/ It overloads | operator and provide a lot of pipe-functions likeadd, first, where, tail
etc.>>> [1, 2, 3, 4] | where(lambda x: x % 2 == 0) | add 6 >>> sum([1, [2, 3], 4] | traverse) 10
Plus it's very easy to write own pipe-functions
@Pipe def p_sqrt(x): return sqrt(x) @Pipe def p_pr(x): print(x) 9 | p_sqrt | p_pr
quelle