#archiveteam-bs 2017-06-03,Sat

↑back Search ←Prev date Next date→ (Showing only urls - See all)(Click on time to show url line in full context)

WhoWhatWhen
Somebody2Sanqui: FWIW, here's the nanog-l post: https://mailman.nanog.org/pipermail/nanog/2017-June/091273.html [06:18]
SketchCowSome crawls of images at http://www.portalgraphics.net/ appear to have garbage characters in them at the WARC level.
http://web.archive.org/web/20160724223147/http://www.portalgraphics.net/pg/illust/?image_id=23
http://web.archive.org/web/20160725181823/http://www.portalgraphics.net/pg/illust/?image_id=25
http://web.archive.org/web/20160725222705/http://www.portalgraphics.net/pg/illust/?image_id=28
http://web.archive.org/web/20160723205134/http://www.portalgraphics.net/pg/illust/?image_id=31
http://web.archive.org/web/20160725091406/http://www.portalgraphics.net/pg/illust/?image_id=1332
http://web.archive.org/web/20160724001629/http://www.portalgraphics.net/pg/illust/?image_id=10575
http://web.archive.org/web/20160615222159/http://www.portalgraphics.net/pg/illust/?image_id=10575
http://web.archive.org/web/20160725195224/http://www.portalgraphics.net/pg/illust/?image_id=10577
http://web.archive.org/web/20160725161940/http://www.portalgraphics.net/pg/illust/?image_id=10578
http://web.archive.org/web/20160723213647/http://www.portalgraphics.net/pg/illust/?image_id=10581
http://web.archive.org/web/20160725154248/http://www.portalgraphics.net/pg/illust/?image_id=28089
[21:21]
joepie91SketchCow: this reminds me of something... mirrors that were run off this WARC that I produced (using wget + warc) used to display some garbage at the top of the page: https://cryptoanarchy.freed0m4all.net/at-cryptoanarchy.warc.gz
SketchCow: here's the original: https://archive.org/details/CryptoAnarchyWarc
[21:29]
qwebirc28: https://i.imgur.com/SJKTZfg.png [21:33]
based on https://archive.org/download/archiveteam_portalgraphics_20160727140857/portalgraphics_20160727140857.megawarc.warc.gz [21:36]
all the garbage on https://web.archive.org/web/20160724001629/http://www.portalgraphics.net/pg/illust/?image_id=10575 is within hex character range [21:38]
it also seems to regularly fail around the same types of values - for example, in https://web.archive.org/web/20160725154248/http://www.portalgraphics.net/pg/illust/?image_id=28089, if you look at the source, you'll see that every time there's an ?image_id= link, it will insert numeric garbage around the ID [21:51]
JRWR-Work: that'd be https://github.com/ArchiveTeam/portalgraphics-grab then [21:53]
arkiverhttps://web.archive.org/web/20160723205134id_/http://www.portalgraphics.net/pg/illust/?image_id=31 is the not rewritten version [22:02]
joepie91<s db cript type="text/javascript" src="http://platform.twitter.com/widgets.js"></script><!--<a href="http://twitter.com/share" class="twitter-share-button" data-count="none" data-via="portalgraphics" data-lang="ja">Tweet</a> [22:15]
<s 69 cript type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> [22:19]
<s c1 pan>ポタグラ</span>&nbsp;|&nbsp;<a href="https://web.archive.org/web/20160725154248/http://www.portalgraphics.net/oc/">openCanvas</a>&nbsp;|&nbsp;<a href="https://web.archive.org/web/20160725154248/http://www.portalgraphics.net/cl/">コミラボ</a></li> [22:19]
arkiver: here's a report for the last URL: http://sprunge.us/RjWi -- all instances using newlines (not spaces), where the middle line for each case is the garbage value, with a dot for each trailing space, and the first and last lines are the surrounding context [22:23]
arkiver: https://git.cryto.net/joepie91/garbagechecker [22:30]

↑back Search ←Prev date Next date→ (Showing only urls - See all)(Click on time to show url line in full context)