处理同事 blog 时发现其网页中有使用旧式的 Google Analytics,由于历史原因使用了多行的 tag,形如:
<style type='text/css'>img#wpstats{display:none}</style><!-- Google Analytics Tracking by Google Analyticator 6.5.0: http://www.videousermanuals.com/google-analyticator/ -->
<script type="text/javascript">
var analyticsFileTypes = [''];
var analyticsSnippet = 'enabled';
var analyticsEventTracking = 'enabled';
</script>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-XXXXXXXX-X']);
_gaq.push(['_addDevId', 'XXXXX']); // Google Analyticator App ID with Google
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
希望将
<!-- Google Analytics Tracking by Google Analyticator 6.5.0: http://www.videousermanuals.com/google-analyticator/ -->
开始的两个
<script>..</script>
tag删除。
使用了这样的 Perl 命令:
perl -i -pe 'BEGIN{undef $/;} s/<!-- Google Analytics Tracking by.*?<\/script>.*?<\/script>//sm' $(grep -lR ga.js *)
这里的重点是匹配时应使用不贪心匹配(.*?)而不是贪心匹配(.*)