Time |
Nickname |
Message |
00:03
🔗
|
|
Coderjoe_ has joined #urlteam |
00:05
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
01:01
🔗
|
|
JesseW has joined #urlteam |
01:02
🔗
|
|
Start has joined #urlteam |
01:29
🔗
|
Start |
could someone please scrape the urlteam results for home.comcast.net, comcastbiz.net and comcastbiz.com? |
01:53
🔗
|
JesseW |
I can do so, unless someone who has already downloaded them wants to do it... |
02:17
🔗
|
JesseW |
OK, I'm working on downloading the URLteam results (starting with the 88 GB generation 1 torrent) |
02:29
🔗
|
|
aaaaaaaa_ has joined #urlteam |
02:29
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
02:29
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
02:34
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
02:41
🔗
|
JesseW |
Interesting -- the 379 incremental items come to a total of 86.5 GB, as compared with the last torrent, which is 88 GB. |
03:26
🔗
|
JesseW |
Well, I have a few from home.comcast.net, such as |
03:26
🔗
|
JesseW |
SfwqG0|http://home.comcast.net/~s.namkung/ |
03:28
🔗
|
JesseW |
The incremental ones appear to be .zip files *containing* .xz files, which contain the actual data... |
04:05
🔗
|
JesseW |
queuing up all 379 incremental dumps via bt... |
04:13
🔗
|
aaaaaaaaa |
with xargs, I bet! |
04:14
🔗
|
JesseW |
yep. :-) |
04:36
🔗
|
|
aaaaaaaa_ has joined #urlteam |
04:36
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
04:36
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
04:37
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
04:38
🔗
|
JesseW |
Here's the pipeline I'm using to search for the comcast URLs: |
04:38
🔗
|
JesseW |
(cd /mnt/bigdisk/transmission_files/downloads/urlteam_2015-09-17-19-00-08/ ; for foo in *.zip; do echo $foo; unzip -p $foo '*.txt.xz' | xz -d | fgrep $'home.comcast.com\ncomcastbiz.net\ncomcastbiz.com' | tee -a /mnt/bigdisk/comcast_urls.txt; done) |
04:38
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
05:04
🔗
|
JesseW |
First hit (and it's not actually a hit)... |
05:04
🔗
|
JesseW |
JlRq4r|https://www.actonsoftware.com/acton/beacon/xreport.jsp?c=Visit Details&u=https://www.actonsoftware.com/acton/beacon/beaconCompaniesDrillDown.jsp%3Fa%3D248%26aa%3D7968%26k%3D729502%26ips%3D%255B71.194.171.69%255D%26start%3D1335121634700%26email%3DGregory@gdwgroup.comcastbiz.net |
05:07
🔗
|
JesseW |
Another, actual hit (although probably spam): |
05:08
🔗
|
JesseW |
YodXKi|http://lifo.comcastbiz.net/25/asbestos-plaster-walls |
05:08
🔗
|
JesseW |
No, the domain is actually a thing (a presumably, worth archiving): http://lifo.comcastbiz.net/ |
05:08
🔗
|
JesseW |
"Life Organizers / Tax & Planning Consultants" |
05:35
🔗
|
JesseW |
Well, here's something certainly worth saving: http://thediscoverycenter.comcastbiz.net/about/ -- website of a 6 acre privately-owned park in Fresno, CA devoted to teaching kids about science since 1956. |
07:35
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
12:31
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
12:34
🔗
|
|
dashcloud has joined #urlteam |
12:35
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
14:15
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:16
🔗
|
|
Start has joined #urlteam |
14:20
🔗
|
|
Start has quit IRC (Client Quit) |
15:10
🔗
|
|
JesseW has joined #urlteam |
15:20
🔗
|
JesseW |
Start: Here are my results so far; I've run into HW trouble, so I thought it better to provide what I have: http://pastebin.ca/3165040 |
15:32
🔗
|
JesseW |
OK, I've got analysis working again -- about 2000 zip files downloaded, still to be processed. (and various more that I haven't downloaded yet) |
15:39
🔗
|
|
Start has joined #urlteam |
16:09
🔗
|
|
Start_ has joined #urlteam |
16:09
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
16:13
🔗
|
|
Start_ is now known as Start |
16:14
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
16:32
🔗
|
|
JesseW has joined #urlteam |
16:34
🔗
|
JesseW |
Maybe corrupted file: urlteam_2015-03-22-01-09-14/bitly_6.2015-03-22-01-09-14.zip |
17:01
🔗
|
JesseW |
Updated results (110, so far): http://pastebin.ca/3165137 |
17:13
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
17:38
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
18:24
🔗
|
|
VADemon has joined #urlteam |
18:59
🔗
|
|
slang has joined #urlteam |
19:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
19:10
🔗
|
|
dashcloud has joined #urlteam |
19:10
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
19:30
🔗
|
|
_0x2A has quit IRC (Quit: ZNC - 1.6.0 - http://znc.in) |
21:40
🔗
|
|
slang has quit IRC (Quit: Page closed) |
23:40
🔗
|
|
Start has joined #urlteam |