#urlteam 2015-09-18,Fri

↑back Search

Time Nickname Message
00:03 🔗 Coderjoe_ has joined #urlteam
00:05 🔗 Coderjoe has quit IRC (Read error: Operation timed out)
01:01 🔗 JesseW has joined #urlteam
01:02 🔗 Start has joined #urlteam
01:29 🔗 Start could someone please scrape the urlteam results for home.comcast.net, comcastbiz.net and comcastbiz.com?
01:53 🔗 JesseW I can do so, unless someone who has already downloaded them wants to do it...
02:17 🔗 JesseW OK, I'm working on downloading the URLteam results (starting with the 88 GB generation 1 torrent)
02:29 🔗 aaaaaaaa_ has joined #urlteam
02:29 🔗 aaaaaaaaa has quit IRC (Read error: Connection reset by peer)
02:29 🔗 swebb sets mode: +o aaaaaaaa_
02:34 🔗 aaaaaaaa_ is now known as aaaaaaaaa
02:41 🔗 JesseW Interesting -- the 379 incremental items come to a total of 86.5 GB, as compared with the last torrent, which is 88 GB.
03:26 🔗 JesseW Well, I have a few from home.comcast.net, such as
03:26 🔗 JesseW SfwqG0|http://home.comcast.net/~s.namkung/
03:28 🔗 JesseW The incremental ones appear to be .zip files *containing* .xz files, which contain the actual data...
04:05 🔗 JesseW queuing up all 379 incremental dumps via bt...
04:13 🔗 aaaaaaaaa with xargs, I bet!
04:14 🔗 JesseW yep. :-)
04:36 🔗 aaaaaaaa_ has joined #urlteam
04:36 🔗 aaaaaaaaa has quit IRC (Read error: Connection reset by peer)
04:36 🔗 swebb sets mode: +o aaaaaaaa_
04:37 🔗 aaaaaaaa_ is now known as aaaaaaaaa
04:38 🔗 JesseW Here's the pipeline I'm using to search for the comcast URLs:
04:38 🔗 JesseW (cd /mnt/bigdisk/transmission_files/downloads/urlteam_2015-09-17-19-00-08/ ; for foo in *.zip; do echo $foo; unzip -p $foo '*.txt.xz' | xz -d | fgrep $'home.comcast.com\ncomcastbiz.net\ncomcastbiz.com' | tee -a /mnt/bigdisk/comcast_urls.txt; done)
04:38 🔗 aaaaaaaaa has quit IRC (Read error: Connection reset by peer)
05:04 🔗 JesseW First hit (and it's not actually a hit)...
05:04 🔗 JesseW JlRq4r|https://www.actonsoftware.com/acton/beacon/xreport.jsp?c=Visit Details&u=https://www.actonsoftware.com/acton/beacon/beaconCompaniesDrillDown.jsp%3Fa%3D248%26aa%3D7968%26k%3D729502%26ips%3D%255B71.194.171.69%255D%26start%3D1335121634700%26email%3DGregory@gdwgroup.comcastbiz.net
05:07 🔗 JesseW Another, actual hit (although probably spam):
05:08 🔗 JesseW YodXKi|http://lifo.comcastbiz.net/25/asbestos-plaster-walls
05:08 🔗 JesseW No, the domain is actually a thing (a presumably, worth archiving): http://lifo.comcastbiz.net/
05:08 🔗 JesseW "Life Organizers / Tax & Planning Consultants"
05:35 🔗 JesseW Well, here's something certainly worth saving: http://thediscoverycenter.comcastbiz.net/about/ -- website of a 6 acre privately-owned park in Fresno, CA devoted to teaching kids about science since 1956.
07:35 🔗 JesseW has quit IRC (Read error: Operation timed out)
12:31 🔗 dashcloud has quit IRC (Read error: Operation timed out)
12:34 🔗 dashcloud has joined #urlteam
12:35 🔗 svchfoo1 sets mode: +o dashcloud
14:15 🔗 Start has quit IRC (Quit: Disconnected.)
14:16 🔗 Start has joined #urlteam
14:20 🔗 Start has quit IRC (Client Quit)
15:10 🔗 JesseW has joined #urlteam
15:20 🔗 JesseW Start: Here are my results so far; I've run into HW trouble, so I thought it better to provide what I have: http://pastebin.ca/3165040
15:32 🔗 JesseW OK, I've got analysis working again -- about 2000 zip files downloaded, still to be processed. (and various more that I haven't downloaded yet)
15:39 🔗 Start has joined #urlteam
16:09 🔗 Start_ has joined #urlteam
16:09 🔗 Start has quit IRC (Read error: Connection reset by peer)
16:13 🔗 Start_ is now known as Start
16:14 🔗 JesseW has quit IRC (Read error: Operation timed out)
16:32 🔗 JesseW has joined #urlteam
16:34 🔗 JesseW Maybe corrupted file: urlteam_2015-03-22-01-09-14/bitly_6.2015-03-22-01-09-14.zip
17:01 🔗 JesseW Updated results (110, so far): http://pastebin.ca/3165137
17:13 🔗 JesseW has quit IRC (Read error: Operation timed out)
17:38 🔗 Start has quit IRC (Quit: Disconnected.)
18:24 🔗 VADemon has joined #urlteam
18:59 🔗 slang has joined #urlteam
19:08 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:10 🔗 dashcloud has joined #urlteam
19:10 🔗 svchfoo1 sets mode: +o dashcloud
19:30 🔗 _0x2A has quit IRC (Quit: ZNC - 1.6.0 - http://znc.in)
21:40 🔗 slang has quit IRC (Quit: Page closed)
23:40 🔗 Start has joined #urlteam

irclogger-viewer