使用shell从标记中提取多个属性
I'm trying to extract the 2 attributes "lat" and "lon" from a file with the following format:
我正在尝试使用以下格式从文件中提取2个属性“lat”和“lon”:
<trkpt lat="38.8577288" lon="-9.0997973"/>
<trkpt lat="38.8576367" lon="-9.1000557"/>
<trkpt lat="38.8575259" lon="-9.1006374"/>
...
and get the following output:
并获得以下输出:
-9.0997973,38.8577288
-9.1000557,38.8576367
-9.1006374,38.8575259
(Yes the lat/lon pair are inverted on purpose)
(是的,lat / lon对是故意倒置的)
I don't know much about regex, but looking around on the web, this is all I was able to achieve:
我不太了解正则表达式,但在网上浏览,这是我能够实现的:
grep 'lat="[^"]*"' doc.txt | grep -no 'lat="[^"]*"'
output:
1:lat="38.8577288"
2:lat="38.8576367"
3:lat="38.8575259"
I'm not sure how to get going with this... Thanks in advance for your help
我不确定如何解决这个问题...在此先感谢您的帮助
3 个解决方案
#1
0
Try using Python like so:
尝试使用Python,如下所示:
python -c 'import re; open("dest", "w").write("\n".join([lat + "," + lon for lat, lon in re.findall("""<trkpt lat="([-0-9\.]+)" lon="([-0-9\.]+)"/>""", open("source").read())]))'
where dest
is the path to the output file containing the comma-separated lat and lon values, and source
is the path to the input file containing the XML style tags. (This is meant for use in a linux shell.) Note that I've assumed the input tags format will be very consistent.
其中dest是包含以逗号分隔的lat和lon值的输出文件的路径,source是包含XML样式标记的输入文件的路径。 (这是为了在linux shell中使用。)请注意,我假设输入标签格式将非常一致。
The regex in there is <trkpt lat="([-0-9\.]+)" lon="([-0-9\.]+)"/>
.
那里的正则表达式是
If you don't have a linux shell handy, or you'd prefer using a python script or using it interactively, then use the following for a less one-liner approach:
如果您没有方便的Linux shell,或者您更喜欢使用python脚本或以交互方式使用它,那么请使用以下内容进行更简单的方法:
#! /usr/bin/env python
# use the regex module
import re
# read in the file
in_file = open('source').read()
# Find matches using regex
matches = re.findall('<trkpt lat="([-0-9\.]+)" lon="([-0-9\.]+)"/>', in_file)
# make new file lines by combining lat and lon from matches
out_lines = [lat + ',' + lon for lat, lon in matches]
# convert array of strings to single string
out_lines = '\n'.join(out_lines)
# output to new file
open('dest', 'w').write(out_lines)
更多相关文章
- html页面中给img标签的src属性赋值为一张图片的存储路径,图片不显
- 将PHP代码添加到.html文件
- 性能权衡 - CSS效率,CSS文件大小,HTML文件大小
- 上传文件,那么form中用来接收文件的数据成员是什么类型?
- 请问json文件在html head中以script的形式导入了,怎样读取这个jso
- 帮助相对路径链接到本地文件
- html文件上传到vss上面后文件大小改变
- asp.net core,返回一个view,并没有正确的返回html,而是view文件原
- c#生成html静态文件时出现空白行,怎么去掉utf-8中的bom