#archiveteam 2016-02-11,Thu

↑back Search

Time Nickname Message
00:11 🔗 Start has joined #archiveteam
00:23 🔗 maseck_ has quit IRC (Quit: No Ping reply in 180 seconds.)
00:23 🔗 maseck has joined #archiveteam
00:42 🔗 hive-mind has quit IRC (Ping timeout: 260 seconds)
00:43 🔗 hive-mind has joined #archiveteam
00:45 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
00:45 🔗 wyatt8740 has joined #archiveteam
00:52 🔗 Ghost_of_ has joined #archiveteam
00:55 🔗 Atom__ has quit IRC (Read error: Connection reset by peer)
00:56 🔗 Atom__ has joined #archiveteam
01:00 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
01:01 🔗 wyatt8740 has joined #archiveteam
01:28 🔗 kyan has joined #archiveteam
01:51 🔗 vitzli has joined #archiveteam
02:01 🔗 Stiletto has quit IRC (Read error: Connection reset by peer)
02:09 🔗 philpem has quit IRC (Ping timeout: 260 seconds)
02:13 🔗 JesseW has joined #archiveteam
02:51 🔗 Ghost_of_ has quit IRC (Quit: Leaving)
03:20 🔗 vtyl has joined #archiveteam
03:21 🔗 lytv has quit IRC (Read error: Operation timed out)
03:21 🔗 einstein9 has joined #archiveteam
03:45 🔗 kyan has quit IRC (This computer has gone to sleep)
03:48 🔗 vitzli has quit IRC (Leaving)
04:06 🔗 lukeman has joined #archiveteam
04:13 🔗 dashcloud Christopher Rush, one of the early Magic: the Gathering card artists, has died
04:17 🔗 ianweller has left
04:25 🔗 mutoso has quit IRC (Read error: Operation timed out)
04:34 🔗 mutoso has joined #archiveteam
04:43 🔗 mutoso has quit IRC (Ping timeout: 252 seconds)
04:50 🔗 mutoso has joined #archiveteam
04:51 🔗 Coderjoe has quit IRC (Read error: Operation timed out)
05:17 🔗 mutoso has quit IRC (Read error: Operation timed out)
05:21 🔗 Coderjoe has joined #archiveteam
05:22 🔗 ndiddy has quit IRC (Quit: Leaving)
05:41 🔗 Sk1d has quit IRC (Ping timeout: 200 seconds)
05:47 🔗 Sk1d has joined #archiveteam
06:03 🔗 vitzli has joined #archiveteam
06:11 🔗 vitzli has quit IRC (Leaving)
06:22 🔗 WinterFox has joined #archiveteam
06:22 🔗 vitzli has joined #archiveteam
07:26 🔗 megaminxw has joined #archiveteam
07:56 🔗 Fletcher Are any projects hitting FOS particularly hard? I've dropped to 1MB/s upload
07:57 🔗 einstein9 GameTrailers is ~1GB per video
07:58 🔗 Fletcher ah, that may be where the problem is
07:59 🔗 Fletcher (my archivebot upload directory has ballooned to 600G)
07:59 🔗 schbirid has joined #archiveteam
08:03 🔗 JesseW Fletcher: which pipelines do you run, anyway?
08:04 🔗 Fletcher F_*
08:05 🔗 JesseW ah, cool
08:06 🔗 JesseW thank you for running those
08:06 🔗 JesseW Do you know who runs the aupipe ones?
08:07 🔗 Fletcher no idea I'm afraid :/
08:08 🔗 Fletcher should be able to trace it back to when the ssh key was submitted though
08:09 🔗 JesseW hm, that's not public info, though, right?
08:10 🔗 JesseW and what about nico-only_at_home (which seems to have been stuck for a couple of weeks)?
08:11 🔗 signius has quit IRC (Remote host closed the connection)
08:13 🔗 signius has joined #archiveteam
08:18 🔗 Fletcher yipdw would be the only one with key info
08:18 🔗 JesseW has quit IRC (Quit: Leaving.)
08:18 🔗 yipdw trs80 runs aupipe
08:19 🔗 yipdw I think I have SSH access to it from the control node
08:32 🔗 kyan has joined #archiveteam
08:48 🔗 kyan has quit IRC (Leaving)
09:03 🔗 trs80 hmm? yeah, I have aupipe
09:03 🔗 trs80 jessew, fletcher: ^^
09:08 🔗 Stiletto has joined #archiveteam
09:23 🔗 Sk1d has left
09:34 🔗 Stilett0 has joined #archiveteam
09:39 🔗 Stiletto has quit IRC (Read error: Operation timed out)
09:41 🔗 Stilett0 is now known as Stiletto
09:51 🔗 lukeman has quit IRC (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
09:57 🔗 Stilett0 has joined #archiveteam
09:57 🔗 Stiletto has quit IRC (Read error: Connection reset by peer)
10:01 🔗 Stilett0 is now known as Stiletto
10:01 🔗 Stiletto has quit IRC (Remote host closed the connection)
10:01 🔗 Stiletto has joined #archiveteam
10:04 🔗 mutoso has joined #archiveteam
10:13 🔗 SketchCow Gah
10:13 🔗 SketchCow OK
10:38 🔗 SketchCow Load average of FOS is now up to 40, what could go wrong
10:39 🔗 midas could go up to 80
10:41 🔗 SketchCow Could go down to 0
10:43 🔗 vtyl has quit IRC (Quit: Leaving)
10:45 🔗 Cameron_D has joined #archiveteam
10:49 🔗 lytv has joined #archiveteam
10:56 🔗 bzc6p has joined #archiveteam
10:56 🔗 swebb sets mode: +o bzc6p
10:58 🔗 Stiletto is now known as Stilett0
11:02 🔗 alberto has joined #archiveteam
11:06 🔗 bzc6p lol
11:06 🔗 bzc6p Chorca: you said 40 terabytes? That's awesome.
11:07 🔗 bzc6p einstein9: Not in Hungary.
11:08 🔗 einstein9 k
11:13 🔗 bzc6p sets mode: +oooo achip aliz chazchaz chfoo
11:13 🔗 bzc6p sets mode: +oooo chfoo- closure Coderjoe Ctrl-S___
11:13 🔗 bzc6p sets mode: +oooo dashcloud Fletcher Fletcher_ Fusl
11:13 🔗 bzc6p sets mode: +oooo GLaDOS godane HCross2 Infreq_
11:14 🔗 bzc6p sets mode: +oooo ivan` joepie91 Kazzy Kenshin
11:14 🔗 bzc6p sets mode: +oooo midas Muad-Dib Nemo_bis ohhdemgir
11:14 🔗 bzc6p sets mode: +oooo phuzion PurpleSym Sanqui schbirid
11:14 🔗 bzc6p sets mode: +oooo SimpBrain SmileyG Start trs80
11:14 🔗 bzc6p sets mode: +ooo vitzli wp494 wyatt8740
11:43 🔗 bzc6p_ has joined #archiveteam
11:43 🔗 swebb sets mode: +o bzc6p_
11:44 🔗 bzc6p has quit IRC (Ping timeout: 250 seconds)
11:55 🔗 WinterFox has quit IRC (Remote host closed the connection)
12:01 🔗 einstein9 has quit IRC (Read error: Operation timed out)
12:05 🔗 bzc6p_ has quit IRC (Ping timeout: 250 seconds)
12:39 🔗 bzc6p has joined #archiveteam
12:39 🔗 swebb sets mode: +o bzc6p
12:43 🔗 signius_ has joined #archiveteam
12:44 🔗 arkiver3 has joined #archiveteam
12:45 🔗 signius has quit IRC (Read error: Operation timed out)
12:51 🔗 arkiver3 has quit IRC (Ping timeout: 252 seconds)
12:51 🔗 arkiver3 has joined #archiveteam
12:55 🔗 arkiver3 has quit IRC (Ping timeout: 252 seconds)
12:56 🔗 arkiver3 has joined #archiveteam
12:56 🔗 signius_ has quit IRC (Remote host closed the connection)
12:58 🔗 weles has joined #archiveteam
12:58 🔗 signius has joined #archiveteam
13:15 🔗 bzc6p has quit IRC (Ping timeout: 250 seconds)
13:17 🔗 Atom-- has joined #archiveteam
13:20 🔗 arkiver3 has quit IRC (Ping timeout: 252 seconds)
13:20 🔗 Atom__ has quit IRC (Ping timeout: 252 seconds)
13:20 🔗 Atom__ has joined #archiveteam
13:21 🔗 Atom-- has quit IRC (Ping timeout: 252 seconds)
13:22 🔗 arkiver3 has joined #archiveteam
13:22 🔗 bzc6p has joined #archiveteam
13:22 🔗 swebb sets mode: +o bzc6p
13:27 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
13:28 🔗 schbirid http://www.factmag.com/2016/02/11/soundcloud-financial-report-44m-losses/
13:36 🔗 schbirid https://soundcloud.com/people/directory
13:37 🔗 schbirid track ids go up to ~300 million
13:43 🔗 Stiletto has joined #archiveteam
13:46 🔗 arkiver3 has quit IRC (Ping timeout: 252 seconds)
14:10 🔗 megaminxw has quit IRC (Quit: Leaving.)
14:15 🔗 arkiver3 has joined #archiveteam
14:37 🔗 [phire] has quit IRC (Quit: ZNC - http://znc.in)
14:38 🔗 vOYtEC_ has quit IRC (Quit: rm -r *)
14:54 🔗 [phire] has joined #archiveteam
14:59 🔗 Start has quit IRC (Quit: Disconnected.)
15:00 🔗 johtso that's a whole lot of data..
15:03 🔗 balrog shit. http://www.factmag.com/2016/02/11/soundcloud-financial-report-44m-losses/
15:03 🔗 balrog I see people posted it
15:03 🔗 balrog ugh the copyright cartel has been after them
15:04 🔗 balrog even though they are mostly user-generated content
15:04 🔗 MrRadar That's like how the RIAA requires any public place that allows bands to play to get a license even if they require the bands to only play origianl music "just in case" they accidentally play a cover
15:06 🔗 bzc6p There has been a discussion about SoundCloud back in August: http://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-08-27,Thu&sel=180#l176
15:06 🔗 bzc6p SketchCow There's no way we can get Soundcloud
15:06 🔗 balrog "2.5 PB of data"
15:06 🔗 balrog yikes.
15:07 🔗 bzc6p (it was a quote)
15:07 🔗 balrog but how do you identify the original content?
15:07 🔗 vitzli is there a way to get certain channels?
15:08 🔗 MrRadar youtube-dl supports SoundCloud
15:08 🔗 MrRadar So if you have any artists you follow now is the time to rip them
15:08 🔗 balrog grr lots of podcasts host there
15:09 🔗 vitzli yep, going after startalk radio
15:09 🔗 vitzli thank you
15:22 🔗 bzc6p has left
15:24 🔗 arkiver3 If we want a project for SoundCloud and SketchCow agrees we can start a project
15:24 🔗 vitzli maybe do census over wikipedia external links to soundcloud
15:24 🔗 vitzli &
15:24 🔗 vitzli dammit, it was a question mark, sorry
15:27 🔗 arkiver3 Who is recording the gravitation waves livestream?(!!)
15:28 🔗 arkiver3 start in 2 minutes!
15:28 🔗 arkiver3 can't record it from where I am, so someone please record
15:29 🔗 snape URL?
15:31 🔗 arkiver3 https://www.youtube.com/watch?v=c7293kAiPZw
15:33 🔗 MrRadar arkiver: Can I just throw the URL into youtube-dl?
15:34 🔗 arkiver3 I don't think it records livestreams
15:34 🔗 MrRadar What's the recommended method then?
15:34 🔗 arkiver3 But I think the youtubestream will also be online as a videos after the stream
15:34 🔗 arkiver3 and I'm totally sure someone else is recording this
15:35 🔗 snape With youtube-dl mpegts complains about continuity errors, sigh.
15:36 🔗 joepie91 arkiver3: see #archivebot re: al jazeera
15:37 🔗 joepie91 MrRadar: arkiver3: yes, youtube-dl records livestreams in theory, but it has never worked for me
15:38 🔗 MrRadar Hmm, youtube-dl is grabbing a bunch of segments for me though I don't know if it will continue
15:39 🔗 Start has joined #archiveteam
15:46 🔗 Zei-Pii has joined #archiveteam
15:51 🔗 zzqw has quit IRC (Ping timeout: 252 seconds)
15:52 🔗 z00nx has quit IRC (Ping timeout: 252 seconds)
15:52 🔗 Fletcher has quit IRC (Ping timeout: 252 seconds)
15:52 🔗 goekesmi has quit IRC (Ping timeout: 250 seconds)
15:53 🔗 schbirid soundcloud STORING 2.5 PB will be including original formats (wav etc) and private tracks
15:53 🔗 Rickster has quit IRC (Ping timeout: 260 seconds)
15:54 🔗 HCross has quit IRC (Ping timeout: 250 seconds)
15:55 🔗 rduser has quit IRC (Ping timeout: 260 seconds)
15:55 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
15:56 🔗 midas has quit IRC (Ping timeout: 260 seconds)
15:56 🔗 ivan` has quit IRC (Read error: Operation timed out)
15:57 🔗 arkiver3 has quit IRC (Quit: Nettalk6 - www.ntalk.de)
15:57 🔗 Test__ has joined #archiveteam
15:58 🔗 sevs44936 has quit IRC (Ping timeout: 633 seconds)
15:59 🔗 Test__ has quit IRC (Client Quit)
16:02 🔗 Rickster has joined #archiveteam
16:03 🔗 hawc145 has joined #archiveteam
16:03 🔗 z00nx has joined #archiveteam
16:04 🔗 sevs44936 has joined #archiveteam
16:05 🔗 rduser has joined #archiveteam
16:05 🔗 midas has joined #archiveteam
16:09 🔗 zzqw has joined #archiveteam
16:09 🔗 marvinw has joined #archiveteam
16:11 🔗 godane has quit IRC (Read error: Operation timed out)
16:14 🔗 goekesmi has joined #archiveteam
16:16 🔗 schbirid "youtube-dl http://api.soundcloud.com/tracks/182804938" works by id. gets best audio format. this one is wav (and a fine mix)
16:16 🔗 sevs44936 has quit IRC (Read error: Operation timed out)
16:17 🔗 sevs44936 has joined #archiveteam
16:19 🔗 K4k has joined #archiveteam
16:21 🔗 Start has quit IRC (Quit: Disconnected.)
16:21 🔗 Famicoman has joined #archiveteam
16:24 🔗 Start has joined #archiveteam
16:25 🔗 Start has quit IRC (Remote host closed the connection)
16:25 🔗 Start has joined #archiveteam
16:34 🔗 godane has joined #archiveteam
16:36 🔗 SketchCow 1. Please go after Al-Jazeera America.
16:36 🔗 SketchCow 2. Please at least go after the most most popular soundcloud stuff
16:40 🔗 Fletcher has joined #archiveteam
16:42 🔗 swebb I can send out what I built with regards to the soundcloud grab.
16:42 🔗 swebb Userlists, etc ...
16:44 🔗 hawc145 is now known as HCross
16:51 🔗 snape Might be worth trying to grab all the CBC podcasts off there, as they have a history of quietly disappearing over time, elsewhere.
16:52 🔗 SketchCow Please do
16:56 🔗 snape Can anyone think of other gov't-run stuff on Soundcloud? I see the Voice of America has several thousand programs in Chinese, and Germany's Deutsche Welle has almost 2300 items going back to at least 2012.
16:57 🔗 Nemo_bis Surely they'll preserve "user data" online! http://www.theguardian.com/media/2016/feb/11/time-inc-buys-what-is-left-of-myspace-for-its-user-data
17:03 🔗 snape arkiver, looks like the NSF livestream ended. I have 1.3GB of it, though I'm not sure if it's viewable or not.
17:04 🔗 schbirid #soundbutt ? :)
17:04 🔗 vOYtEC has joined #archiveteam
17:05 🔗 MrRadar LOL
17:05 🔗 MrRadar During the last soundcloud scare we used #soundclown
17:06 🔗 Fletcher_ sets mode: +o Fletcher
17:06 🔗 xmc probably should reuse that for consistency
17:06 🔗 schbirid yeah
17:07 🔗 Start has quit IRC (Quit: Disconnected.)
17:07 🔗 marvinw is now known as ivan`
17:11 🔗 JesseW has joined #archiveteam
17:20 🔗 snape FWIW, youtube-dl will (eventually...) grab all of a Soundcloud user's tracks if you point it at soundcloud.com/username/tracks
17:25 🔗 HCross Can whoever is running a newsbuddy instance without asking please stop.
17:27 🔗 antomatic Yikes.. got a gametrailers item here that's going to take 116 hours to rsync!
17:27 🔗 antomatic all good fun. :)
17:27 🔗 swebb My archive of the soundcloud crawl that I did in November: https://www.amazon.com/clouddrive/share/7thOzbwVF2hwD5iVfXYESu1DhaksPGHDFZBJWs9FQIU?ref_=cd_ph_share_link_copy
17:28 🔗 swebb It's a mysqldump of the user data that I accumulated - hundreds of millions of users - I targetted the most followed and the people with the most followers. 2.5G bzipped. Includes the url to their profile, their user-id, username, follower number, followings number & avatar url. From that data, you can crawl their content pretty easily. Soundcloud eventually blocked me from crawling them, but I was able to crawl them for 5-6 months before they
17:28 🔗 swebb found and blocked my IP.
17:30 🔗 JesseW has quit IRC (Quit: Leaving.)
17:32 🔗 JesseW has joined #archiveteam
17:33 🔗 schbirid has quit IRC (Quit: Leaving)
17:35 🔗 JesseW has quit IRC (Client Quit)
17:36 🔗 joepie91 ... 1.5MB/sec from amazon cloud drive
17:36 🔗 schbirid has joined #archiveteam
17:48 🔗 Fletcher_ has quit IRC (Quit: WeeChat 0.4.3)
17:49 🔗 Tomcat_ has joined #archiveteam
17:50 🔗 Fletcher_ has joined #archiveteam
17:53 🔗 swebb Yea, ACD isn't the most high-speed thing in the world. :)
17:53 🔗 swebb but I got unlimited storage for a year for $5. :)
17:55 🔗 swebb Hard to beat that. :)
17:58 🔗 Nemo_bis What offer is that :o
18:01 🔗 swebb It was a Christmas thing I think. The normal unlimited ACD plan is $50/yr
18:01 🔗 phuzion Nemo_bis: Amazon had a promotional offer for 1 year of ACD for $5
18:01 🔗 swebb or a black friday thing
18:01 🔗 phuzion yeah
18:01 🔗 Fletcher sets mode: +o Fletcher_
18:02 🔗 vitzli black friday deal, got one too
18:02 🔗 swebb They may do it again this year - it was pretty popular with data hoarders
18:04 🔗 vitzli how big is the database when imported? 7-8GB?
18:05 🔗 swebb Not sure, sorry.
18:05 🔗 swebb Probably larger than 10GB
18:21 🔗 rduser has quit IRC (Read error: Operation timed out)
18:34 🔗 hive-mind has quit IRC (Ping timeout: 260 seconds)
18:36 🔗 hive-mind has joined #archiveteam
18:37 🔗 Tomcat_ has quit IRC (Ping timeout: 362 seconds)
18:42 🔗 Start has joined #archiveteam
18:43 🔗 vitzli has quit IRC (Leaving)
18:44 🔗 SmileyG http://www.factmag.com/2016/02/11/soundcloud-financial-report-44m-losses/
18:45 🔗 SmileyG what's ACD?
18:46 🔗 snape Amazon Cloud Drive
18:49 🔗 Tomcat_ has joined #archiveteam
19:04 🔗 SketchCow FOS is holding up
19:04 🔗 SketchCow 69% full
19:04 🔗 SketchCow But stuff is going out
19:12 🔗 Chorca man, GT uploads going so slow
19:16 🔗 SketchCow Well, you're hammering the literal hell out of the machine.
19:16 🔗 Chorca haha, i assumed. When we first started i hit like 25MB/s
19:19 🔗 Start has quit IRC (Quit: Disconnected.)
19:23 🔗 Start has joined #archiveteam
19:25 🔗 rduser has joined #archiveteam
19:40 🔗 Tomcat_ has quit IRC (Ping timeout: 362 seconds)
19:43 🔗 swebb Oh, and here's my crawler code, BTW: https://github.com/scumola/soundcloud-crawler If you want to deplicate my crawl setup, you'll need mysql, rabbitmq and couchbase (or memcache) running somewhere on your network.
19:44 🔗 SketchCow I would choose Al Jazeera over Soundcloud
19:44 🔗 SketchCow But first we need to get through these two others.
19:44 🔗 SketchCow We also maybe need another rsync target
19:44 🔗 SketchCow That then pushes to FOS before I upload
19:46 🔗 RichardG has quit IRC (Ping timeout: 250 seconds)
19:52 🔗 RichardG has joined #archiveteam
19:54 🔗 scyther has joined #archiveteam
20:04 🔗 SimpBrain many sites failing, fos needs to breathe :P
20:05 🔗 MrRadar Yeah, its been crazy lately
20:05 🔗 arkiver SketchCow: FOS can handle them, GameTrailers is almost done
20:12 🔗 LibreWulf has joined #archiveteam
20:12 🔗 arkiver chfoo: can you please send me the logs of gametrailers?
20:14 🔗 Sk1d has joined #archiveteam
20:14 🔗 chfoo arkiver: ok, give me a few minutes
20:14 🔗 arkiver thanks! I'd like to make sure everything went well
20:15 🔗 LibreWulf I'm not sure if anyone had heard of this, but there are rumors that soundcloud is having tough financial times.
20:15 🔗 HCross #soundclown
20:16 🔗 LibreWulf yeah essentially. what on earth will I do without my shitty joke mixes
20:16 🔗 LibreWulf But I did figure I'd stop in and at least mention it. Several sites and blogs and whatnot are doubting they'll survive a lot longer
20:17 🔗 Sanqui no, they meant, we literally have channel #soundclown for it
20:17 🔗 LibreWulf Oh, really? I had no clue, thanks
20:18 🔗 megaminxw has joined #archiveteam
20:19 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
20:25 🔗 Sk1d has joined #archiveteam
20:27 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
20:27 🔗 ploopkazo has joined #archiveteam
20:28 🔗 dashcloud has joined #archiveteam
20:33 🔗 LibreWulf has quit IRC (Quit: changing clients)
20:44 🔗 Tomcat_ has joined #archiveteam
20:45 🔗 Start has quit IRC (Quit: Disconnected.)
20:46 🔗 Zei-Pii has quit IRC (Read error: Connection reset by peer)
20:52 🔗 megaminxw has quit IRC (Quit: Leaving.)
20:58 🔗 useretail has quit IRC (Ping timeout: 252 seconds)
21:01 🔗 nickname_ has joined #archiveteam
21:01 🔗 arkiver Atluxity: are you coming back to the gametrailers grab?
21:01 🔗 nickname_ soundclown is the name of a joke service, parodying soundcloud.
21:02 🔗 Atluxity arkiver: I am unable to... the items are too big
21:03 🔗 HCross Seems BnAboyZ has you covered though
21:03 🔗 Atluxity I restart the grab a couple times a day, and they go through a pile of items, then get filled
21:03 🔗 Atluxity :\
21:04 🔗 Atluxity if the pipeline could recognize a full disk, remove it, and ask for a new item, that would help me a lot
21:04 🔗 Atluxity *remove the current item from disk
21:08 🔗 nickname_ How would you use youtube-dl to download json info from soundcloud?
21:09 🔗 weles has quit IRC (Read error: Operation timed out)
21:09 🔗 nickname_ Ignore it, it's an offtopic question
21:18 🔗 useretail has joined #archiveteam
21:19 🔗 Muad-Dib <arkiver> SketchCow: FOS can handle them, GameTrailers is almost done
21:19 🔗 Muad-Dib lol, I only just found out :")
21:19 🔗 Muad-Dib about GT dying I mean
21:33 🔗 SketchCow The uploads are going slow, but not that slow.
21:39 🔗 arkiver SketchCow: we're discussing in #soundclown on what to do
21:39 🔗 arkiver we're thinking discovery to get the most popular tracks
21:39 🔗 arkiver then we'll decide on what to grab
21:41 🔗 MrRadar Is there a channel for discussing Al Jazeera?
21:41 🔗 arkiver We'll do a small grab of all links of al jazeera from the sitemap
21:41 🔗 arkiver that should have all articles
21:41 🔗 arkiver but the project will start in the weekend
21:41 🔗 MrRadar There's also the ArchiveBot grab that's been running for about a month
21:42 🔗 swebb Can't archive.org just archive Al Jazeera America?
21:42 🔗 swebb It looks to be a pretty normal website.
21:42 🔗 arkiver I think we're the contributors to IA that go into specific sites
21:42 🔗 arkiver IA more goes into wide crawls
21:42 🔗 swebb Ahh.
21:43 🔗 arkiver SketchCow would know best though
21:43 🔗 MrRadar It would also be nice for future researchers to be able to download WARCs with the entire site
21:43 🔗 swebb I could just spin up an archive crawler myself - I've done that in the past.
21:43 🔗 MrRadar Instead of having to trudge through the Wayback Machine
21:43 🔗 SketchCow Generally, they do focused crawls here and there but now they check to be able to see if Archive Team isn't on the case.
21:43 🔗 SketchCow I love the newbies in here
21:43 🔗 SketchCow Never gets old
21:43 🔗 * SketchCow is working on a wiki entry for how to make items run in the Archive
21:43 🔗 swebb heretrix - I've used that before.
21:44 🔗 swebb I could just fire it up and point it at Al Jazeera America.
21:44 🔗 swebb It creates WARCs
21:45 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
21:47 🔗 dashcloud has joined #archiveteam
21:50 🔗 swebb Who you callin' a noob? :)
21:50 🔗 snape You love the newbies who don't speak of themselves in the third person, anyway...
21:51 🔗 joepie91 lol
21:51 🔗 joepie91 also, archivebot has been on AJA since mid-january
21:52 🔗 swebb Is archivebot Heritrix?
21:53 🔗 ersi nope
21:53 🔗 xmc i think it's wpull
21:54 🔗 ersi snape: oh don't start you '
21:55 🔗 snape Wouldn't dare. :)
21:57 🔗 Tomcat_ has quit IRC (Remote host closed the connection)
22:04 🔗 swebb I started a Heritrix crawl of AlJazeeraAmerica
22:04 🔗 swebb https://www.evernote.com/l/ACms9qHxNSRPjIJLcdAHBUIGWHEJMNKI3zY
22:07 🔗 joepie91 yep, archivebot is wpull
22:19 🔗 JetBalsa has joined #archiveteam
22:19 🔗 SN4T14 has quit IRC (Quit: Leaving)
22:21 🔗 scyther has quit IRC (Quit: Leaving)
22:42 🔗 nickname_ has quit IRC (Read error: Operation timed out)
22:47 🔗 SketchCow Is Gametrailers still going?
22:47 🔗 nickname_ has joined #archiveteam
22:48 🔗 HCross SketchCow, all items are out, just uploads are slow
22:49 🔗 MrRadar Yes, I've got items that probably won't finish uploading until Saturday
22:49 🔗 MrRadar At current upload speeds
22:49 🔗 HCross ^^
22:50 🔗 HCross Ive got over 100GB to upload
22:52 🔗 HCross at 50kB/s
23:03 🔗 swebb Whenever I use Heritrix for crawling archiveteam.org stuff, I always use the useragent: Mozilla/5.0 (compatible; heritrix/1.14.4 +http://archiveteam.org)
23:09 🔗 brayden has joined #archiveteam
23:09 🔗 swebb sets mode: +o brayden
23:11 🔗 schbirid has quit IRC (Quit: Leaving)
23:14 🔗 arkiver SketchCow: yes
23:15 🔗 swebb sets mode: +o arkiver
23:15 🔗 Start has joined #archiveteam
23:15 🔗 brayden_ has quit IRC (Read error: Operation timed out)
23:51 🔗 mutoso has quit IRC (Ping timeout: 260 seconds)

irclogger-viewer