| Time |
Nickname |
Message |
|
00:03
🔗
|
|
Coderjoe_ has joined #urlteam |
|
00:05
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
|
01:01
🔗
|
|
JesseW has joined #urlteam |
|
01:02
🔗
|
|
Start has joined #urlteam |
|
01:29
🔗
|
Start |
could someone please scrape the urlteam results for home.comcast.net, comcastbiz.net and comcastbiz.com? |
|
01:53
🔗
|
JesseW |
I can do so, unless someone who has already downloaded them wants to do it... |
|
02:17
🔗
|
JesseW |
OK, I'm working on downloading the URLteam results (starting with the 88 GB generation 1 torrent) |
|
02:29
🔗
|
|
aaaaaaaa_ has joined #urlteam |
|
02:29
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
|
02:29
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
|
02:34
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
|
02:41
🔗
|
JesseW |
Interesting -- the 379 incremental items come to a total of 86.5 GB, as compared with the last torrent, which is 88 GB. |
|
03:26
🔗
|
JesseW |
Well, I have a few from home.comcast.net, such as |
|
03:26
🔗
|
JesseW |
SfwqG0|http://home.comcast.net/~s.namkung/ |
|
03:28
🔗
|
JesseW |
The incremental ones appear to be .zip files *containing* .xz files, which contain the actual data... |
|
04:05
🔗
|
JesseW |
queuing up all 379 incremental dumps via bt... |
|
04:13
🔗
|
aaaaaaaaa |
with xargs, I bet! |
|
04:14
🔗
|
JesseW |
yep. :-) |
|
04:36
🔗
|
|
aaaaaaaa_ has joined #urlteam |
|
04:36
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
|
04:36
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
|
04:37
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
|
04:38
🔗
|
JesseW |
Here's the pipeline I'm using to search for the comcast URLs: |
|
04:38
🔗
|
JesseW |
(cd /mnt/bigdisk/transmission_files/downloads/urlteam_2015-09-17-19-00-08/ ; for foo in *.zip; do echo $foo; unzip -p $foo '*.txt.xz' | xz -d | fgrep $'home.comcast.com\ncomcastbiz.net\ncomcastbiz.com' | tee -a /mnt/bigdisk/comcast_urls.txt; done) |
|
04:38
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
|
05:04
🔗
|
JesseW |
First hit (and it's not actually a hit)... |
|
05:04
🔗
|
JesseW |
JlRq4r|https://www.actonsoftware.com/acton/beacon/xreport.jsp?c=Visit Details&u=https://www.actonsoftware.com/acton/beacon/beaconCompaniesDrillDown.jsp%3Fa%3D248%26aa%3D7968%26k%3D729502%26ips%3D%255B71.194.171.69%255D%26start%3D1335121634700%26email%3DGregory@gdwgroup.comcastbiz.net |
|
05:07
🔗
|
JesseW |
Another, actual hit (although probably spam): |
|
05:08
🔗
|
JesseW |
YodXKi|http://lifo.comcastbiz.net/25/asbestos-plaster-walls |
|
05:08
🔗
|
JesseW |
No, the domain is actually a thing (a presumably, worth archiving): http://lifo.comcastbiz.net/ |
|
05:08
🔗
|
JesseW |
"Life Organizers / Tax & Planning Consultants" |
|
05:35
🔗
|
JesseW |
Well, here's something certainly worth saving: http://thediscoverycenter.comcastbiz.net/about/ -- website of a 6 acre privately-owned park in Fresno, CA devoted to teaching kids about science since 1956. |
|
07:35
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
|
12:31
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
12:34
🔗
|
|
dashcloud has joined #urlteam |
|
12:35
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
|
14:15
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
14:16
🔗
|
|
Start has joined #urlteam |
|
14:20
🔗
|
|
Start has quit IRC (Client Quit) |
|
15:10
🔗
|
|
JesseW has joined #urlteam |
|
15:20
🔗
|
JesseW |
Start: Here are my results so far; I've run into HW trouble, so I thought it better to provide what I have: http://pastebin.ca/3165040 |
|
15:32
🔗
|
JesseW |
OK, I've got analysis working again -- about 2000 zip files downloaded, still to be processed. (and various more that I haven't downloaded yet) |
|
15:39
🔗
|
|
Start has joined #urlteam |
|
16:09
🔗
|
|
Start_ has joined #urlteam |
|
16:09
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
|
16:13
🔗
|
|
Start_ is now known as Start |
|
16:14
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
|
16:32
🔗
|
|
JesseW has joined #urlteam |
|
16:34
🔗
|
JesseW |
Maybe corrupted file: urlteam_2015-03-22-01-09-14/bitly_6.2015-03-22-01-09-14.zip |
|
17:01
🔗
|
JesseW |
Updated results (110, so far): http://pastebin.ca/3165137 |
|
17:13
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
|
17:38
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
18:24
🔗
|
|
VADemon has joined #urlteam |
|
18:59
🔗
|
|
slang has joined #urlteam |
|
19:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
19:10
🔗
|
|
dashcloud has joined #urlteam |
|
19:10
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
|
19:30
🔗
|
|
_0x2A has quit IRC (Quit: ZNC - 1.6.0 - http://znc.in) |
|
21:40
🔗
|
|
slang has quit IRC (Quit: Page closed) |
|
23:40
🔗
|
|
Start has joined #urlteam |