Time |
Nickname |
Message |
00:11
🔗
|
|
Start has joined #archiveteam |
00:23
🔗
|
|
maseck_ has quit IRC (Quit: No Ping reply in 180 seconds.) |
00:23
🔗
|
|
maseck has joined #archiveteam |
00:42
🔗
|
|
hive-mind has quit IRC (Ping timeout: 260 seconds) |
00:43
🔗
|
|
hive-mind has joined #archiveteam |
00:45
🔗
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
00:45
🔗
|
|
wyatt8740 has joined #archiveteam |
00:52
🔗
|
|
Ghost_of_ has joined #archiveteam |
00:55
🔗
|
|
Atom__ has quit IRC (Read error: Connection reset by peer) |
00:56
🔗
|
|
Atom__ has joined #archiveteam |
01:00
🔗
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
01:01
🔗
|
|
wyatt8740 has joined #archiveteam |
01:28
🔗
|
|
kyan has joined #archiveteam |
01:51
🔗
|
|
vitzli has joined #archiveteam |
02:01
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
02:09
🔗
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
02:13
🔗
|
|
JesseW has joined #archiveteam |
02:51
🔗
|
|
Ghost_of_ has quit IRC (Quit: Leaving) |
03:20
🔗
|
|
vtyl has joined #archiveteam |
03:21
🔗
|
|
lytv has quit IRC (Read error: Operation timed out) |
03:21
🔗
|
|
einstein9 has joined #archiveteam |
03:45
🔗
|
|
kyan has quit IRC (This computer has gone to sleep) |
03:48
🔗
|
|
vitzli has quit IRC (Leaving) |
04:06
🔗
|
|
lukeman has joined #archiveteam |
04:13
🔗
|
dashcloud |
Christopher Rush, one of the early Magic: the Gathering card artists, has died |
04:17
🔗
|
|
ianweller has left |
04:25
🔗
|
|
mutoso has quit IRC (Read error: Operation timed out) |
04:34
🔗
|
|
mutoso has joined #archiveteam |
04:43
🔗
|
|
mutoso has quit IRC (Ping timeout: 252 seconds) |
04:50
🔗
|
|
mutoso has joined #archiveteam |
04:51
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
05:17
🔗
|
|
mutoso has quit IRC (Read error: Operation timed out) |
05:21
🔗
|
|
Coderjoe has joined #archiveteam |
05:22
🔗
|
|
ndiddy has quit IRC (Quit: Leaving) |
05:41
🔗
|
|
Sk1d has quit IRC (Ping timeout: 200 seconds) |
05:47
🔗
|
|
Sk1d has joined #archiveteam |
06:03
🔗
|
|
vitzli has joined #archiveteam |
06:11
🔗
|
|
vitzli has quit IRC (Leaving) |
06:22
🔗
|
|
WinterFox has joined #archiveteam |
06:22
🔗
|
|
vitzli has joined #archiveteam |
07:26
🔗
|
|
megaminxw has joined #archiveteam |
07:56
🔗
|
Fletcher |
Are any projects hitting FOS particularly hard? I've dropped to 1MB/s upload |
07:57
🔗
|
einstein9 |
GameTrailers is ~1GB per video |
07:58
🔗
|
Fletcher |
ah, that may be where the problem is |
07:59
🔗
|
Fletcher |
(my archivebot upload directory has ballooned to 600G) |
07:59
🔗
|
|
schbirid has joined #archiveteam |
08:03
🔗
|
JesseW |
Fletcher: which pipelines do you run, anyway? |
08:04
🔗
|
Fletcher |
F_* |
08:05
🔗
|
JesseW |
ah, cool |
08:06
🔗
|
JesseW |
thank you for running those |
08:06
🔗
|
JesseW |
Do you know who runs the aupipe ones? |
08:07
🔗
|
Fletcher |
no idea I'm afraid :/ |
08:08
🔗
|
Fletcher |
should be able to trace it back to when the ssh key was submitted though |
08:09
🔗
|
JesseW |
hm, that's not public info, though, right? |
08:10
🔗
|
JesseW |
and what about nico-only_at_home (which seems to have been stuck for a couple of weeks)? |
08:11
🔗
|
|
signius has quit IRC (Remote host closed the connection) |
08:13
🔗
|
|
signius has joined #archiveteam |
08:18
🔗
|
Fletcher |
yipdw would be the only one with key info |
08:18
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
08:18
🔗
|
yipdw |
trs80 runs aupipe |
08:19
🔗
|
yipdw |
I think I have SSH access to it from the control node |
08:32
🔗
|
|
kyan has joined #archiveteam |
08:48
🔗
|
|
kyan has quit IRC (Leaving) |
09:03
🔗
|
trs80 |
hmm? yeah, I have aupipe |
09:03
🔗
|
trs80 |
jessew, fletcher: ^^ |
09:08
🔗
|
|
Stiletto has joined #archiveteam |
09:23
🔗
|
|
Sk1d has left |
09:34
🔗
|
|
Stilett0 has joined #archiveteam |
09:39
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
09:41
🔗
|
|
Stilett0 is now known as Stiletto |
09:51
🔗
|
|
lukeman has quit IRC (Quit: My MacBook Pro has gone to sleep. ZZZzzz…) |
09:57
🔗
|
|
Stilett0 has joined #archiveteam |
09:57
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
10:01
🔗
|
|
Stilett0 is now known as Stiletto |
10:01
🔗
|
|
Stiletto has quit IRC (Remote host closed the connection) |
10:01
🔗
|
|
Stiletto has joined #archiveteam |
10:04
🔗
|
|
mutoso has joined #archiveteam |
10:13
🔗
|
SketchCow |
Gah |
10:13
🔗
|
SketchCow |
OK |
10:38
🔗
|
SketchCow |
Load average of FOS is now up to 40, what could go wrong |
10:39
🔗
|
midas |
could go up to 80 |
10:41
🔗
|
SketchCow |
Could go down to 0 |
10:43
🔗
|
|
vtyl has quit IRC (Quit: Leaving) |
10:45
🔗
|
|
Cameron_D has joined #archiveteam |
10:49
🔗
|
|
lytv has joined #archiveteam |
10:56
🔗
|
|
bzc6p has joined #archiveteam |
10:56
🔗
|
|
swebb sets mode: +o bzc6p |
10:58
🔗
|
|
Stiletto is now known as Stilett0 |
11:02
🔗
|
|
alberto has joined #archiveteam |
11:06
🔗
|
bzc6p |
lol |
11:06
🔗
|
bzc6p |
Chorca: you said 40 terabytes? That's awesome. |
11:07
🔗
|
bzc6p |
einstein9: Not in Hungary. |
11:08
🔗
|
einstein9 |
k |
11:13
🔗
|
|
bzc6p sets mode: +oooo achip aliz chazchaz chfoo |
11:13
🔗
|
|
bzc6p sets mode: +oooo chfoo- closure Coderjoe Ctrl-S___ |
11:13
🔗
|
|
bzc6p sets mode: +oooo dashcloud Fletcher Fletcher_ Fusl |
11:13
🔗
|
|
bzc6p sets mode: +oooo GLaDOS godane HCross2 Infreq_ |
11:14
🔗
|
|
bzc6p sets mode: +oooo ivan` joepie91 Kazzy Kenshin |
11:14
🔗
|
|
bzc6p sets mode: +oooo midas Muad-Dib Nemo_bis ohhdemgir |
11:14
🔗
|
|
bzc6p sets mode: +oooo phuzion PurpleSym Sanqui schbirid |
11:14
🔗
|
|
bzc6p sets mode: +oooo SimpBrain SmileyG Start trs80 |
11:14
🔗
|
|
bzc6p sets mode: +ooo vitzli wp494 wyatt8740 |
11:43
🔗
|
|
bzc6p_ has joined #archiveteam |
11:43
🔗
|
|
swebb sets mode: +o bzc6p_ |
11:44
🔗
|
|
bzc6p has quit IRC (Ping timeout: 250 seconds) |
11:55
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
12:01
🔗
|
|
einstein9 has quit IRC (Read error: Operation timed out) |
12:05
🔗
|
|
bzc6p_ has quit IRC (Ping timeout: 250 seconds) |
12:39
🔗
|
|
bzc6p has joined #archiveteam |
12:39
🔗
|
|
swebb sets mode: +o bzc6p |
12:43
🔗
|
|
signius_ has joined #archiveteam |
12:44
🔗
|
|
arkiver3 has joined #archiveteam |
12:45
🔗
|
|
signius has quit IRC (Read error: Operation timed out) |
12:51
🔗
|
|
arkiver3 has quit IRC (Ping timeout: 252 seconds) |
12:51
🔗
|
|
arkiver3 has joined #archiveteam |
12:55
🔗
|
|
arkiver3 has quit IRC (Ping timeout: 252 seconds) |
12:56
🔗
|
|
arkiver3 has joined #archiveteam |
12:56
🔗
|
|
signius_ has quit IRC (Remote host closed the connection) |
12:58
🔗
|
|
weles has joined #archiveteam |
12:58
🔗
|
|
signius has joined #archiveteam |
13:15
🔗
|
|
bzc6p has quit IRC (Ping timeout: 250 seconds) |
13:17
🔗
|
|
Atom-- has joined #archiveteam |
13:20
🔗
|
|
arkiver3 has quit IRC (Ping timeout: 252 seconds) |
13:20
🔗
|
|
Atom__ has quit IRC (Ping timeout: 252 seconds) |
13:20
🔗
|
|
Atom__ has joined #archiveteam |
13:21
🔗
|
|
Atom-- has quit IRC (Ping timeout: 252 seconds) |
13:22
🔗
|
|
arkiver3 has joined #archiveteam |
13:22
🔗
|
|
bzc6p has joined #archiveteam |
13:22
🔗
|
|
swebb sets mode: +o bzc6p |
13:27
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
13:28
🔗
|
schbirid |
http://www.factmag.com/2016/02/11/soundcloud-financial-report-44m-losses/ |
13:36
🔗
|
schbirid |
https://soundcloud.com/people/directory |
13:37
🔗
|
schbirid |
track ids go up to ~300 million |
13:43
🔗
|
|
Stiletto has joined #archiveteam |
13:46
🔗
|
|
arkiver3 has quit IRC (Ping timeout: 252 seconds) |
14:10
🔗
|
|
megaminxw has quit IRC (Quit: Leaving.) |
14:15
🔗
|
|
arkiver3 has joined #archiveteam |
14:37
🔗
|
|
[phire] has quit IRC (Quit: ZNC - http://znc.in) |
14:38
🔗
|
|
vOYtEC_ has quit IRC (Quit: rm -r *) |
14:54
🔗
|
|
[phire] has joined #archiveteam |
14:59
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:00
🔗
|
johtso |
that's a whole lot of data.. |
15:03
🔗
|
balrog |
shit. http://www.factmag.com/2016/02/11/soundcloud-financial-report-44m-losses/ |
15:03
🔗
|
balrog |
I see people posted it |
15:03
🔗
|
balrog |
ugh the copyright cartel has been after them |
15:04
🔗
|
balrog |
even though they are mostly user-generated content |
15:04
🔗
|
MrRadar |
That's like how the RIAA requires any public place that allows bands to play to get a license even if they require the bands to only play origianl music "just in case" they accidentally play a cover |
15:06
🔗
|
bzc6p |
There has been a discussion about SoundCloud back in August: http://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-08-27,Thu&sel=180#l176 |
15:06
🔗
|
bzc6p |
SketchCow There's no way we can get Soundcloud |
15:06
🔗
|
balrog |
"2.5 PB of data" |
15:06
🔗
|
balrog |
yikes. |
15:07
🔗
|
bzc6p |
(it was a quote) |
15:07
🔗
|
balrog |
but how do you identify the original content? |
15:07
🔗
|
vitzli |
is there a way to get certain channels? |
15:08
🔗
|
MrRadar |
youtube-dl supports SoundCloud |
15:08
🔗
|
MrRadar |
So if you have any artists you follow now is the time to rip them |
15:08
🔗
|
balrog |
grr lots of podcasts host there |
15:09
🔗
|
vitzli |
yep, going after startalk radio |
15:09
🔗
|
vitzli |
thank you |
15:22
🔗
|
|
bzc6p has left |
15:24
🔗
|
arkiver3 |
If we want a project for SoundCloud and SketchCow agrees we can start a project |
15:24
🔗
|
vitzli |
maybe do census over wikipedia external links to soundcloud |
15:24
🔗
|
vitzli |
& |
15:24
🔗
|
vitzli |
dammit, it was a question mark, sorry |
15:27
🔗
|
arkiver3 |
Who is recording the gravitation waves livestream?(!!) |
15:28
🔗
|
arkiver3 |
start in 2 minutes! |
15:28
🔗
|
arkiver3 |
can't record it from where I am, so someone please record |
15:29
🔗
|
snape |
URL? |
15:31
🔗
|
arkiver3 |
https://www.youtube.com/watch?v=c7293kAiPZw |
15:33
🔗
|
MrRadar |
arkiver: Can I just throw the URL into youtube-dl? |
15:34
🔗
|
arkiver3 |
I don't think it records livestreams |
15:34
🔗
|
MrRadar |
What's the recommended method then? |
15:34
🔗
|
arkiver3 |
But I think the youtubestream will also be online as a videos after the stream |
15:34
🔗
|
arkiver3 |
and I'm totally sure someone else is recording this |
15:35
🔗
|
snape |
With youtube-dl mpegts complains about continuity errors, sigh. |
15:36
🔗
|
joepie91 |
arkiver3: see #archivebot re: al jazeera |
15:37
🔗
|
joepie91 |
MrRadar: arkiver3: yes, youtube-dl records livestreams in theory, but it has never worked for me |
15:38
🔗
|
MrRadar |
Hmm, youtube-dl is grabbing a bunch of segments for me though I don't know if it will continue |
15:39
🔗
|
|
Start has joined #archiveteam |
15:46
🔗
|
|
Zei-Pii has joined #archiveteam |
15:51
🔗
|
|
zzqw has quit IRC (Ping timeout: 252 seconds) |
15:52
🔗
|
|
z00nx has quit IRC (Ping timeout: 252 seconds) |
15:52
🔗
|
|
Fletcher has quit IRC (Ping timeout: 252 seconds) |
15:52
🔗
|
|
goekesmi has quit IRC (Ping timeout: 250 seconds) |
15:53
🔗
|
schbirid |
soundcloud STORING 2.5 PB will be including original formats (wav etc) and private tracks |
15:53
🔗
|
|
Rickster has quit IRC (Ping timeout: 260 seconds) |
15:54
🔗
|
|
HCross has quit IRC (Ping timeout: 250 seconds) |
15:55
🔗
|
|
rduser has quit IRC (Ping timeout: 260 seconds) |
15:55
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
15:56
🔗
|
|
midas has quit IRC (Ping timeout: 260 seconds) |
15:56
🔗
|
|
ivan` has quit IRC (Read error: Operation timed out) |
15:57
🔗
|
|
arkiver3 has quit IRC (Quit: Nettalk6 - www.ntalk.de) |
15:57
🔗
|
|
Test__ has joined #archiveteam |
15:58
🔗
|
|
sevs44936 has quit IRC (Ping timeout: 633 seconds) |
15:59
🔗
|
|
Test__ has quit IRC (Client Quit) |
16:02
🔗
|
|
Rickster has joined #archiveteam |
16:03
🔗
|
|
hawc145 has joined #archiveteam |
16:03
🔗
|
|
z00nx has joined #archiveteam |
16:04
🔗
|
|
sevs44936 has joined #archiveteam |
16:05
🔗
|
|
rduser has joined #archiveteam |
16:05
🔗
|
|
midas has joined #archiveteam |
16:09
🔗
|
|
zzqw has joined #archiveteam |
16:09
🔗
|
|
marvinw has joined #archiveteam |
16:11
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
16:14
🔗
|
|
goekesmi has joined #archiveteam |
16:16
🔗
|
schbirid |
"youtube-dl http://api.soundcloud.com/tracks/182804938" works by id. gets best audio format. this one is wav (and a fine mix) |
16:16
🔗
|
|
sevs44936 has quit IRC (Read error: Operation timed out) |
16:17
🔗
|
|
sevs44936 has joined #archiveteam |
16:19
🔗
|
|
K4k has joined #archiveteam |
16:21
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:21
🔗
|
|
Famicoman has joined #archiveteam |
16:24
🔗
|
|
Start has joined #archiveteam |
16:25
🔗
|
|
Start has quit IRC (Remote host closed the connection) |
16:25
🔗
|
|
Start has joined #archiveteam |
16:34
🔗
|
|
godane has joined #archiveteam |
16:36
🔗
|
SketchCow |
1. Please go after Al-Jazeera America. |
16:36
🔗
|
SketchCow |
2. Please at least go after the most most popular soundcloud stuff |
16:40
🔗
|
|
Fletcher has joined #archiveteam |
16:42
🔗
|
swebb |
I can send out what I built with regards to the soundcloud grab. |
16:42
🔗
|
swebb |
Userlists, etc ... |
16:44
🔗
|
|
hawc145 is now known as HCross |
16:51
🔗
|
snape |
Might be worth trying to grab all the CBC podcasts off there, as they have a history of quietly disappearing over time, elsewhere. |
16:52
🔗
|
SketchCow |
Please do |
16:56
🔗
|
snape |
Can anyone think of other gov't-run stuff on Soundcloud? I see the Voice of America has several thousand programs in Chinese, and Germany's Deutsche Welle has almost 2300 items going back to at least 2012. |
16:57
🔗
|
Nemo_bis |
Surely they'll preserve "user data" online! http://www.theguardian.com/media/2016/feb/11/time-inc-buys-what-is-left-of-myspace-for-its-user-data |
17:03
🔗
|
snape |
arkiver, looks like the NSF livestream ended. I have 1.3GB of it, though I'm not sure if it's viewable or not. |
17:04
🔗
|
schbirid |
#soundbutt ? :) |
17:04
🔗
|
|
vOYtEC has joined #archiveteam |
17:05
🔗
|
MrRadar |
LOL |
17:05
🔗
|
MrRadar |
During the last soundcloud scare we used #soundclown |
17:06
🔗
|
|
Fletcher_ sets mode: +o Fletcher |
17:06
🔗
|
xmc |
probably should reuse that for consistency |
17:06
🔗
|
schbirid |
yeah |
17:07
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:07
🔗
|
|
marvinw is now known as ivan` |
17:11
🔗
|
|
JesseW has joined #archiveteam |
17:20
🔗
|
snape |
FWIW, youtube-dl will (eventually...) grab all of a Soundcloud user's tracks if you point it at soundcloud.com/username/tracks |
17:25
🔗
|
HCross |
Can whoever is running a newsbuddy instance without asking please stop. |
17:27
🔗
|
antomatic |
Yikes.. got a gametrailers item here that's going to take 116 hours to rsync! |
17:27
🔗
|
antomatic |
all good fun. :) |
17:27
🔗
|
swebb |
My archive of the soundcloud crawl that I did in November: https://www.amazon.com/clouddrive/share/7thOzbwVF2hwD5iVfXYESu1DhaksPGHDFZBJWs9FQIU?ref_=cd_ph_share_link_copy |
17:28
🔗
|
swebb |
It's a mysqldump of the user data that I accumulated - hundreds of millions of users - I targetted the most followed and the people with the most followers. 2.5G bzipped. Includes the url to their profile, their user-id, username, follower number, followings number & avatar url. From that data, you can crawl their content pretty easily. Soundcloud eventually blocked me from crawling them, but I was able to crawl them for 5-6 months before they |
17:28
🔗
|
swebb |
found and blocked my IP. |
17:30
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
17:32
🔗
|
|
JesseW has joined #archiveteam |
17:33
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
17:35
🔗
|
|
JesseW has quit IRC (Client Quit) |
17:36
🔗
|
joepie91 |
... 1.5MB/sec from amazon cloud drive |
17:36
🔗
|
|
schbirid has joined #archiveteam |
17:48
🔗
|
|
Fletcher_ has quit IRC (Quit: WeeChat 0.4.3) |
17:49
🔗
|
|
Tomcat_ has joined #archiveteam |
17:50
🔗
|
|
Fletcher_ has joined #archiveteam |
17:53
🔗
|
swebb |
Yea, ACD isn't the most high-speed thing in the world. :) |
17:53
🔗
|
swebb |
but I got unlimited storage for a year for $5. :) |
17:55
🔗
|
swebb |
Hard to beat that. :) |
17:58
🔗
|
Nemo_bis |
What offer is that :o |
18:01
🔗
|
swebb |
It was a Christmas thing I think. The normal unlimited ACD plan is $50/yr |
18:01
🔗
|
phuzion |
Nemo_bis: Amazon had a promotional offer for 1 year of ACD for $5 |
18:01
🔗
|
swebb |
or a black friday thing |
18:01
🔗
|
phuzion |
yeah |
18:01
🔗
|
|
Fletcher sets mode: +o Fletcher_ |
18:02
🔗
|
vitzli |
black friday deal, got one too |
18:02
🔗
|
swebb |
They may do it again this year - it was pretty popular with data hoarders |
18:04
🔗
|
vitzli |
how big is the database when imported? 7-8GB? |
18:05
🔗
|
swebb |
Not sure, sorry. |
18:05
🔗
|
swebb |
Probably larger than 10GB |
18:21
🔗
|
|
rduser has quit IRC (Read error: Operation timed out) |
18:34
🔗
|
|
hive-mind has quit IRC (Ping timeout: 260 seconds) |
18:36
🔗
|
|
hive-mind has joined #archiveteam |
18:37
🔗
|
|
Tomcat_ has quit IRC (Ping timeout: 362 seconds) |
18:42
🔗
|
|
Start has joined #archiveteam |
18:43
🔗
|
|
vitzli has quit IRC (Leaving) |
18:44
🔗
|
SmileyG |
http://www.factmag.com/2016/02/11/soundcloud-financial-report-44m-losses/ |
18:45
🔗
|
SmileyG |
what's ACD? |
18:46
🔗
|
snape |
Amazon Cloud Drive |
18:49
🔗
|
|
Tomcat_ has joined #archiveteam |
19:04
🔗
|
SketchCow |
FOS is holding up |
19:04
🔗
|
SketchCow |
69% full |
19:04
🔗
|
SketchCow |
But stuff is going out |
19:12
🔗
|
Chorca |
man, GT uploads going so slow |
19:16
🔗
|
SketchCow |
Well, you're hammering the literal hell out of the machine. |
19:16
🔗
|
Chorca |
haha, i assumed. When we first started i hit like 25MB/s |
19:19
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
19:23
🔗
|
|
Start has joined #archiveteam |
19:25
🔗
|
|
rduser has joined #archiveteam |
19:40
🔗
|
|
Tomcat_ has quit IRC (Ping timeout: 362 seconds) |
19:43
🔗
|
swebb |
Oh, and here's my crawler code, BTW: https://github.com/scumola/soundcloud-crawler If you want to deplicate my crawl setup, you'll need mysql, rabbitmq and couchbase (or memcache) running somewhere on your network. |
19:44
🔗
|
SketchCow |
I would choose Al Jazeera over Soundcloud |
19:44
🔗
|
SketchCow |
But first we need to get through these two others. |
19:44
🔗
|
SketchCow |
We also maybe need another rsync target |
19:44
🔗
|
SketchCow |
That then pushes to FOS before I upload |
19:46
🔗
|
|
RichardG has quit IRC (Ping timeout: 250 seconds) |
19:52
🔗
|
|
RichardG has joined #archiveteam |
19:54
🔗
|
|
scyther has joined #archiveteam |
20:04
🔗
|
SimpBrain |
many sites failing, fos needs to breathe :P |
20:05
🔗
|
MrRadar |
Yeah, its been crazy lately |
20:05
🔗
|
arkiver |
SketchCow: FOS can handle them, GameTrailers is almost done |
20:12
🔗
|
|
LibreWulf has joined #archiveteam |
20:12
🔗
|
arkiver |
chfoo: can you please send me the logs of gametrailers? |
20:14
🔗
|
|
Sk1d has joined #archiveteam |
20:14
🔗
|
chfoo |
arkiver: ok, give me a few minutes |
20:14
🔗
|
arkiver |
thanks! I'd like to make sure everything went well |
20:15
🔗
|
LibreWulf |
I'm not sure if anyone had heard of this, but there are rumors that soundcloud is having tough financial times. |
20:15
🔗
|
HCross |
#soundclown |
20:16
🔗
|
LibreWulf |
yeah essentially. what on earth will I do without my shitty joke mixes |
20:16
🔗
|
LibreWulf |
But I did figure I'd stop in and at least mention it. Several sites and blogs and whatnot are doubting they'll survive a lot longer |
20:17
🔗
|
Sanqui |
no, they meant, we literally have channel #soundclown for it |
20:17
🔗
|
LibreWulf |
Oh, really? I had no clue, thanks |
20:18
🔗
|
|
megaminxw has joined #archiveteam |
20:19
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
20:25
🔗
|
|
Sk1d has joined #archiveteam |
20:27
🔗
|
|
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) |
20:27
🔗
|
|
ploopkazo has joined #archiveteam |
20:28
🔗
|
|
dashcloud has joined #archiveteam |
20:33
🔗
|
|
LibreWulf has quit IRC (Quit: changing clients) |
20:44
🔗
|
|
Tomcat_ has joined #archiveteam |
20:45
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
20:46
🔗
|
|
Zei-Pii has quit IRC (Read error: Connection reset by peer) |
20:52
🔗
|
|
megaminxw has quit IRC (Quit: Leaving.) |
20:58
🔗
|
|
useretail has quit IRC (Ping timeout: 252 seconds) |
21:01
🔗
|
|
nickname_ has joined #archiveteam |
21:01
🔗
|
arkiver |
Atluxity: are you coming back to the gametrailers grab? |
21:01
🔗
|
nickname_ |
soundclown is the name of a joke service, parodying soundcloud. |
21:02
🔗
|
Atluxity |
arkiver: I am unable to... the items are too big |
21:03
🔗
|
HCross |
Seems BnAboyZ has you covered though |
21:03
🔗
|
Atluxity |
I restart the grab a couple times a day, and they go through a pile of items, then get filled |
21:03
🔗
|
Atluxity |
:\ |
21:04
🔗
|
Atluxity |
if the pipeline could recognize a full disk, remove it, and ask for a new item, that would help me a lot |
21:04
🔗
|
Atluxity |
*remove the current item from disk |
21:08
🔗
|
nickname_ |
How would you use youtube-dl to download json info from soundcloud? |
21:09
🔗
|
|
weles has quit IRC (Read error: Operation timed out) |
21:09
🔗
|
nickname_ |
Ignore it, it's an offtopic question |
21:18
🔗
|
|
useretail has joined #archiveteam |
21:19
🔗
|
Muad-Dib |
<arkiver> SketchCow: FOS can handle them, GameTrailers is almost done |
21:19
🔗
|
Muad-Dib |
lol, I only just found out :") |
21:19
🔗
|
Muad-Dib |
about GT dying I mean |
21:33
🔗
|
SketchCow |
The uploads are going slow, but not that slow. |
21:39
🔗
|
arkiver |
SketchCow: we're discussing in #soundclown on what to do |
21:39
🔗
|
arkiver |
we're thinking discovery to get the most popular tracks |
21:39
🔗
|
arkiver |
then we'll decide on what to grab |
21:41
🔗
|
MrRadar |
Is there a channel for discussing Al Jazeera? |
21:41
🔗
|
arkiver |
We'll do a small grab of all links of al jazeera from the sitemap |
21:41
🔗
|
arkiver |
that should have all articles |
21:41
🔗
|
arkiver |
but the project will start in the weekend |
21:41
🔗
|
MrRadar |
There's also the ArchiveBot grab that's been running for about a month |
21:42
🔗
|
swebb |
Can't archive.org just archive Al Jazeera America? |
21:42
🔗
|
swebb |
It looks to be a pretty normal website. |
21:42
🔗
|
arkiver |
I think we're the contributors to IA that go into specific sites |
21:42
🔗
|
arkiver |
IA more goes into wide crawls |
21:42
🔗
|
swebb |
Ahh. |
21:43
🔗
|
arkiver |
SketchCow would know best though |
21:43
🔗
|
MrRadar |
It would also be nice for future researchers to be able to download WARCs with the entire site |
21:43
🔗
|
swebb |
I could just spin up an archive crawler myself - I've done that in the past. |
21:43
🔗
|
MrRadar |
Instead of having to trudge through the Wayback Machine |
21:43
🔗
|
SketchCow |
Generally, they do focused crawls here and there but now they check to be able to see if Archive Team isn't on the case. |
21:43
🔗
|
SketchCow |
I love the newbies in here |
21:43
🔗
|
SketchCow |
Never gets old |
21:43
🔗
|
* |
SketchCow is working on a wiki entry for how to make items run in the Archive |
21:43
🔗
|
swebb |
heretrix - I've used that before. |
21:44
🔗
|
swebb |
I could just fire it up and point it at Al Jazeera America. |
21:44
🔗
|
swebb |
It creates WARCs |
21:45
🔗
|
|
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) |
21:47
🔗
|
|
dashcloud has joined #archiveteam |
21:50
🔗
|
swebb |
Who you callin' a noob? :) |
21:50
🔗
|
snape |
You love the newbies who don't speak of themselves in the third person, anyway... |
21:51
🔗
|
joepie91 |
lol |
21:51
🔗
|
joepie91 |
also, archivebot has been on AJA since mid-january |
21:52
🔗
|
swebb |
Is archivebot Heritrix? |
21:53
🔗
|
ersi |
nope |
21:53
🔗
|
xmc |
i think it's wpull |
21:54
🔗
|
ersi |
snape: oh don't start you ' |
21:55
🔗
|
snape |
Wouldn't dare. :) |
21:57
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
22:04
🔗
|
swebb |
I started a Heritrix crawl of AlJazeeraAmerica |
22:04
🔗
|
swebb |
https://www.evernote.com/l/ACms9qHxNSRPjIJLcdAHBUIGWHEJMNKI3zY |
22:07
🔗
|
joepie91 |
yep, archivebot is wpull |
22:19
🔗
|
|
JetBalsa has joined #archiveteam |
22:19
🔗
|
|
SN4T14 has quit IRC (Quit: Leaving) |
22:21
🔗
|
|
scyther has quit IRC (Quit: Leaving) |
22:42
🔗
|
|
nickname_ has quit IRC (Read error: Operation timed out) |
22:47
🔗
|
SketchCow |
Is Gametrailers still going? |
22:47
🔗
|
|
nickname_ has joined #archiveteam |
22:48
🔗
|
HCross |
SketchCow, all items are out, just uploads are slow |
22:49
🔗
|
MrRadar |
Yes, I've got items that probably won't finish uploading until Saturday |
22:49
🔗
|
MrRadar |
At current upload speeds |
22:49
🔗
|
HCross |
^^ |
22:50
🔗
|
HCross |
Ive got over 100GB to upload |
22:52
🔗
|
HCross |
at 50kB/s |
23:03
🔗
|
swebb |
Whenever I use Heritrix for crawling archiveteam.org stuff, I always use the useragent: Mozilla/5.0 (compatible; heritrix/1.14.4 +http://archiveteam.org) |
23:09
🔗
|
|
brayden has joined #archiveteam |
23:09
🔗
|
|
swebb sets mode: +o brayden |
23:11
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
23:14
🔗
|
arkiver |
SketchCow: yes |
23:15
🔗
|
|
swebb sets mode: +o arkiver |
23:15
🔗
|
|
Start has joined #archiveteam |
23:15
🔗
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
23:51
🔗
|
|
mutoso has quit IRC (Ping timeout: 260 seconds) |