| Time |
Nickname |
Message |
|
00:22
🔗
|
|
Stilett0 is now known as Stiletto |
|
00:37
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
00:39
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
01:11
🔗
|
|
JesseW has joined #archiveteam-bs |
|
01:23
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
01:24
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
02:38
🔗
|
|
marvinw_ is now known as ivan` |
|
02:38
🔗
|
ivan` |
JesseW: thanks, grabbing your youtube submissions |
|
02:38
🔗
|
botpie91 |
ivan`: 06 Mar 07:31Z <yipdw> tell ivan` that https://www.youtube.com/watch?v=DydIK14AvXI must be archived |
|
02:39
🔗
|
JesseW |
ivan`: cool, glad to have it |
|
02:41
🔗
|
yipdw |
oh I was joking but |
|
02:43
🔗
|
ivan` |
https://www.youtube.com/watch?v=DEZ1mBA0AVA from the same channel |
|
02:52
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
02:54
🔗
|
|
mismatch_ has quit IRC (Remote host closed the connection) |
|
02:54
🔗
|
|
mismatch_ has joined #archiveteam-bs |
|
02:56
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
02:59
🔗
|
JesseW |
IDK why Vxbinaca (on the wiki) decided to copy a random essay originally written by someone else on their Wikipedia user page into the archiveteam wiki -- but, eh, whatever. |
|
03:22
🔗
|
|
superkuh has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye) |
|
03:37
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
|
03:40
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
|
03:41
🔗
|
|
Stolett0 has joined #archiveteam-bs |
|
03:44
🔗
|
|
Stolett0 is now known as Stiletto |
|
03:48
🔗
|
|
Stolett0 has joined #archiveteam-bs |
|
03:58
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
|
03:59
🔗
|
|
vitzli has joined #archiveteam-bs |
|
04:00
🔗
|
|
Stolett0 has quit IRC (Read error: Connection reset by peer) |
|
04:01
🔗
|
|
Stolett0 has joined #archiveteam-bs |
|
04:17
🔗
|
* |
JesseW is enjoying https://archive.org/details/msdos_A_Matter_of_Time_1995 |
|
04:25
🔗
|
|
Stolett0 is now known as Stiletto |
|
04:26
🔗
|
|
Stiletto is now known as Stilett0 |
|
04:26
🔗
|
|
Stilett0 is now known as Stiletto |
|
04:29
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
|
05:07
🔗
|
|
RedType has joined #archiveteam-bs |
|
05:07
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
|
05:10
🔗
|
MrRadar |
LOL: https://twitter.com/jzsavoie/status/706622991006732288 |
|
05:14
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
|
05:14
🔗
|
|
bwn has joined #archiveteam-bs |
|
05:14
🔗
|
|
metalcamp has joined #archiveteam-bs |
|
05:21
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
05:21
🔗
|
|
bzc6p_ has joined #archiveteam-bs |
|
05:21
🔗
|
|
swebb sets mode: +o bzc6p_ |
|
05:22
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
|
05:26
🔗
|
|
vitzli has quit IRC (Leaving) |
|
05:26
🔗
|
|
vitzli has joined #archiveteam-bs |
|
05:28
🔗
|
|
bzc6p has quit IRC (Read error: Operation timed out) |
|
05:29
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
|
06:19
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
|
06:56
🔗
|
|
fie has quit IRC (Read error: Connection reset by peer) |
|
07:14
🔗
|
|
ndizzle has joined #archiveteam-bs |
|
07:21
🔗
|
|
metalcamp has joined #archiveteam-bs |
|
07:27
🔗
|
|
xXx_ndidd has quit IRC (Read error: Operation timed out) |
|
07:36
🔗
|
|
RichardG has joined #archiveteam-bs |
|
07:50
🔗
|
|
bzc6p_ has left |
|
08:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
08:28
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
08:32
🔗
|
|
schbirid has joined #archiveteam-bs |
|
08:39
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
|
08:57
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
|
09:34
🔗
|
|
bwn has joined #archiveteam-bs |
|
09:38
🔗
|
|
vtyl has quit IRC (Ping timeout: 250 seconds) |
|
09:42
🔗
|
|
lytv has joined #archiveteam-bs |
|
10:22
🔗
|
|
jut has joined #archiveteam-bs |
|
11:09
🔗
|
|
metalcamp has joined #archiveteam-bs |
|
11:45
🔗
|
|
signius has quit IRC (Read error: Operation timed out) |
|
11:49
🔗
|
|
signius has joined #archiveteam-bs |
|
11:50
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
11:54
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
12:27
🔗
|
|
Sk2d has joined #archiveteam-bs |
|
12:27
🔗
|
|
PurpleSym has quit IRC (*) |
|
12:27
🔗
|
|
PurpleSym has joined #archiveteam-bs |
|
12:27
🔗
|
|
Sk1d has quit IRC (hub.se irc.du.se) |
|
12:43
🔗
|
|
Sk2d is now known as Sk1d |
|
12:43
🔗
|
|
metalcamp has quit IRC (Read error: Connection reset by peer) |
|
12:45
🔗
|
|
metalcamp has joined #archiveteam-bs |
|
12:49
🔗
|
|
VADemon has joined #archiveteam-bs |
|
13:58
🔗
|
|
brayden_ has joined #archiveteam-bs |
|
13:58
🔗
|
|
swebb sets mode: +o brayden_ |
|
14:02
🔗
|
|
brayden has quit IRC (Read error: Operation timed out) |
|
14:12
🔗
|
|
pgoetz has quit IRC (Remote host closed the connection) |
|
14:14
🔗
|
|
pgoetz has joined #archiveteam-bs |
|
14:14
🔗
|
|
pgoetz has quit IRC (Remote host closed the connection) |
|
14:43
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
|
15:02
🔗
|
|
pgoetz has joined #archiveteam-bs |
|
15:08
🔗
|
|
metalcamp has joined #archiveteam-bs |
|
15:37
🔗
|
|
brayden_ is now known as brayden |
|
16:20
🔗
|
godane |
SketchCow: i'm uploading 2010-09 of kpfa |
|
16:24
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
|
16:38
🔗
|
jut |
How do i easily get around a geoblock? |
|
16:38
🔗
|
jut |
our national broadcaster doesn't let me see videos even though i'm home. |
|
16:39
🔗
|
jut |
http://www.lrt.lt/mediateka/irasas/6375 |
|
16:49
🔗
|
vitzli |
try changing IP address? maybe they got new range and it was not updated in the GeoIP db |
|
16:51
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
|
16:56
🔗
|
MrRadar |
Also make sure your DNS server is not set to one outside your country (some services block using DNS) |
|
16:59
🔗
|
|
JesseW has joined #archiveteam-bs |
|
17:11
🔗
|
godane |
http://lrt-live.data.lt/mcache/_definst_/lrt/mp4:video2016/ZIN0_20160307.mp4/playlist.m3u8 |
|
17:12
🔗
|
godane |
http://lrt-live.data.lt:8080/media/radio/2016/03/1012615840.mp3 |
|
17:12
🔗
|
godane |
i found a way to grab stuff from that site |
|
17:22
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
|
17:33
🔗
|
|
xXx_ndidd has joined #archiveteam-bs |
|
17:38
🔗
|
|
bwn has joined #archiveteam-bs |
|
17:43
🔗
|
jut |
If we have some free time we could get them. They have alot of historical footage and they already removed the download button. |
|
17:46
🔗
|
|
ndizzle has quit IRC (Read error: Operation timed out) |
|
18:13
🔗
|
|
vitzli has quit IRC (Leaving) |
|
18:13
🔗
|
SimpBrain |
arkiver, https://github.com/SimpleBrain/livejournal-dump |
|
18:15
🔗
|
arkiver |
SimpBrain: thanks! |
|
18:15
🔗
|
SimpBrain |
its a bit basic but it's doing the job |
|
18:15
🔗
|
arkiver |
I think I have the grab script almost ready for the warrior. Then we'll start on the items you found are working |
|
18:16
🔗
|
SimpBrain |
basically im filtering if a profile exists |
|
18:16
🔗
|
SimpBrain |
so you'll only be grabbing actual accounts |
|
18:16
🔗
|
|
ndizzle has joined #archiveteam-bs |
|
18:16
🔗
|
SimpBrain |
discovery could possibly be done via a warrior job if you want |
|
18:16
🔗
|
arkiver |
Yeah. Of course we can do packs of 100 IDs, but if it turns out there's 10 large sites from those 100 the item might become too big |
|
18:17
🔗
|
SimpBrain |
im guessing a lot of profiles wont have journals on there |
|
18:17
🔗
|
SimpBrain |
so out of the 77 million profiles |
|
18:17
🔗
|
arkiver |
Ok, I'll create a small warrior job for the discovery |
|
18:17
🔗
|
SimpBrain |
i put it about a million |
|
18:18
🔗
|
arkiver |
million existing accounts? |
|
18:18
🔗
|
SimpBrain |
blogs |
|
18:18
🔗
|
arkiver |
Ok, sunds like we have some work :) |
|
18:18
🔗
|
arkiver |
sounds* |
|
18:18
🔗
|
SimpBrain |
most will have signed up to probably comment |
|
18:18
🔗
|
SimpBrain |
hence why some accounts are empty |
|
18:19
🔗
|
arkiver |
Right |
|
18:19
🔗
|
arkiver |
I'll let you know when the discovery job is ready then |
|
18:19
🔗
|
SimpBrain |
so grabbing 100 account blocks, you may get 1 or 2 blogs in that |
|
18:19
🔗
|
SimpBrain |
if that |
|
18:20
🔗
|
SimpBrain |
someone may get unlucky and hit a batch of blogs |
|
18:20
🔗
|
SimpBrain |
but yeah, hitting their profile page will reveal of a blog exists |
|
18:21
🔗
|
SimpBrain |
on the profile page will be "username's journal" near the top in big writing |
|
18:21
🔗
|
xmc |
and a post count |
|
18:21
🔗
|
SimpBrain |
if that link fails, then they dont have a blog |
|
18:22
🔗
|
SimpBrain |
http://www.livejournal.com/profile?userid=3160731&t=I |
|
18:22
🔗
|
SimpBrain |
thats just a random hit |
|
18:22
🔗
|
SimpBrain |
vs http://www.livejournal.com/profile?userid=3160738&t=I |
|
18:23
🔗
|
SimpBrain |
or this one http://www.livejournal.com/profile?userid=3160747&t=I |
|
18:24
🔗
|
SimpBrain |
so html status code 200 or 410 is the numbers required |
|
18:26
🔗
|
SimpBrain |
if enough people hit the discovery with 1 concurrent, a few days. grabbing the content at a regulated crawl, a month? maybe less |
|
18:26
🔗
|
SimpBrain |
nice slow churn project |
|
18:27
🔗
|
SimpBrain |
or longer if we find a lot of content |
|
18:28
🔗
|
SimpBrain |
but hammering the site too much will get you banned |
|
18:29
🔗
|
|
xXx_ndidd has quit IRC (Read error: Operation timed out) |
|
18:45
🔗
|
|
HCross has quit IRC (Read error: Connection reset by peer) |
|
18:49
🔗
|
|
HCross has joined #archiveteam-bs |
|
19:05
🔗
|
|
yipdw has quit IRC (Ping timeout: 1224 seconds) |
|
19:05
🔗
|
|
signius has quit IRC (Ping timeout: 345 seconds) |
|
19:05
🔗
|
|
FalconK has quit IRC (Ping timeout: 345 seconds) |
|
19:05
🔗
|
|
bauruine has quit IRC (Ping timeout: 316 seconds) |
|
19:05
🔗
|
|
balrog has quit IRC (Ping timeout: 345 seconds) |
|
19:05
🔗
|
|
dan- has quit IRC (Ping timeout: 345 seconds) |
|
19:05
🔗
|
|
HCross2 has quit IRC (Read error: Connection reset by peer) |
|
19:05
🔗
|
|
zhongfu has quit IRC (Remote host closed the connection) |
|
19:05
🔗
|
|
wp494_ has joined #archiveteam-bs |
|
19:05
🔗
|
|
winr4r has quit IRC (Write error: Connection reset by peer) |
|
19:05
🔗
|
|
_desu____ has joined #archiveteam-bs |
|
19:05
🔗
|
|
bauruine has joined #archiveteam-bs |
|
19:06
🔗
|
|
Boltsie_ has joined #archiveteam-bs |
|
19:06
🔗
|
|
balrog has joined #archiveteam-bs |
|
19:06
🔗
|
|
swebb sets mode: +o balrog |
|
19:06
🔗
|
|
deathy_ has joined #archiveteam-bs |
|
19:06
🔗
|
|
TheKiwi_ has joined #archiveteam-bs |
|
19:06
🔗
|
|
FalconK has joined #archiveteam-bs |
|
19:07
🔗
|
|
TheKiwi_ has quit IRC (Connection closed) |
|
19:07
🔗
|
|
TheKiwi_ has joined #archiveteam-bs |
|
19:07
🔗
|
|
zhongfu has joined #archiveteam-bs |
|
19:08
🔗
|
|
wp494 has quit IRC (Ping timeout: 274 seconds) |
|
19:08
🔗
|
|
_desu___ has quit IRC (Ping timeout: 274 seconds) |
|
19:08
🔗
|
|
Boltsie has quit IRC (Ping timeout: 274 seconds) |
|
19:08
🔗
|
|
JSharp___ has quit IRC (Ping timeout: 274 seconds) |
|
19:08
🔗
|
|
TheKiwi has quit IRC (Ping timeout: 274 seconds) |
|
19:08
🔗
|
|
Ctrl-S___ has quit IRC (Ping timeout: 274 seconds) |
|
19:08
🔗
|
|
deathy has quit IRC (Ping timeout: 274 seconds) |
|
19:08
🔗
|
|
_desu____ is now known as _desu___ |
|
19:08
🔗
|
|
Boltsie_ is now known as Boltsie |
|
19:08
🔗
|
|
deathy_ is now known as deathy |
|
19:08
🔗
|
|
TheKiwi_ is now known as TheKiwi |
|
19:08
🔗
|
|
dan- has joined #archiveteam-bs |
|
19:08
🔗
|
|
HCross2 has joined #archiveteam-bs |
|
19:10
🔗
|
|
signius has joined #archiveteam-bs |
|
19:10
🔗
|
|
TheKiwi has quit IRC (Remote host closed the connection) |
|
19:11
🔗
|
|
TheKiwi has joined #archiveteam-bs |
|
19:12
🔗
|
|
winr4r has joined #archiveteam-bs |
|
19:23
🔗
|
|
jut has quit IRC (jut) |
|
19:52
🔗
|
ersi |
Why are you guys doing LiveJournal? |
|
19:52
🔗
|
ersi |
Also, I've got a pretty decently large username discovery dump from a year ago that I put together |
|
19:54
🔗
|
ersi |
sqlite> SELECT COUNT(*) FROM users; |
|
19:54
🔗
|
ersi |
6647647 |
|
20:04
🔗
|
SimpBrain |
what count did you get up to? |
|
20:05
🔗
|
JW_work |
ersi: because it is very old, widely regarded as in decline, and has a lot of important stuff buried in it |
|
20:05
🔗
|
ersi |
AFAIK SketchCow knows people on ze insizes |
|
20:05
🔗
|
ersi |
insidez |
|
20:08
🔗
|
ersi |
LJ has an API by the way, which is pretty useful |
|
20:09
🔗
|
xmc |
^ |
|
20:12
🔗
|
ersi |
Especially for finding users at least |
|
20:29
🔗
|
|
ndiddy has joined #archiveteam-bs |
|
20:30
🔗
|
|
ndizzle has quit IRC (Read error: Operation timed out) |
|
20:37
🔗
|
godane |
i'm starting to upload more of Joystiq Massively Speaking |
|
20:38
🔗
|
godane |
2013 to 2015 epsiodes need to be uploaded |
|
20:39
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
20:42
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
21:01
🔗
|
|
metalcamp has joined #archiveteam-bs |
|
21:33
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
|
21:56
🔗
|
|
fie has joined #archiveteam-bs |
|
22:07
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
22:09
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
|
22:11
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
22:51
🔗
|
|
yipdw has joined #archiveteam-bs |
|
22:54
🔗
|
|
wp494_ is now known as wp494 |
|
22:58
🔗
|
yipdw |
huh, what the F |
|
22:58
🔗
|
yipdw |
Mar 2 18:02:02 avatar kernel: [216706.800224] bitlbee invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 |
|
22:58
🔗
|
yipdw |
i've never seen bitlbee get oompa'd |
|
23:01
🔗
|
FalconK |
huh. |
|
23:01
🔗
|
FalconK |
harsh. |
|
23:01
🔗
|
FalconK |
say, yipdw, did you ever figure out who to talk to about collection permissions? |
|
23:01
🔗
|
yipdw |
not yet, but it's probably SketchCow |
|
23:01
🔗
|
|
RedType has left |
|
23:05
🔗
|
yipdw |
on a different note, Grimes is awesome because she inserted "8===D" into the liner notes of Art Angels |
|
23:17
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
|
23:57
🔗
|
FalconK |
yipdw: https://github.com/falconkirtaran/ArchiveBot/commit/217064195aeb8b43c17a197f5362bce231aea67f |
|
23:57
🔗
|
FalconK |
ready to test |
|
23:57
🔗
|
FalconK |
SketchCow: any chance of creds for an IA collection? |