#archiveteam-bs 2016-02-24,Wed

↑back Search

Time Nickname Message
00:03 🔗 Darkstar has quit IRC (Remote host closed the connection)
00:10 🔗 nickname_ has joined #archiveteam-bs
00:28 🔗 Darkstar has joined #archiveteam-bs
00:34 🔗 SketchCow It's my brother's business
00:38 🔗 JW_work1 Oh right — I think I'd heard you mention that in one of your talks.
01:02 🔗 dan- speaking of gitlab
01:02 🔗 dan- if you're looking for a self-hosted git thing, I really like gogs: https://gogs.io/
01:02 🔗 dan- been using it for a while on a raspi and it's quite nice
01:28 🔗 nickname_ has quit IRC (Read error: Operation timed out)
01:35 🔗 JesseW has joined #archiveteam-bs
01:36 🔗 VADemon has joined #archiveteam-bs
01:36 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
02:01 🔗 Boppen has quit IRC (hub.se irc.du.se)
02:01 🔗 Rotab has quit IRC (hub.se irc.du.se)
02:11 🔗 JesseW has quit IRC (Quit: Leaving.)
02:34 🔗 JesseW has joined #archiveteam-bs
02:36 🔗 MrRadar !ao https://twitter.com/TheFriddle/status/702316134008119297 --phantomjs
02:37 🔗 JesseW MrRadar: wrong channel
03:29 🔗 JesseW has quit IRC (Quit: Leaving.)
04:02 🔗 bwn has quit IRC (Ping timeout: 492 seconds)
04:28 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
04:48 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
05:11 🔗 Lord_Nigh has quit IRC (Ping timeout: 250 seconds)
05:27 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:32 🔗 JesseW has joined #archiveteam-bs
05:33 🔗 Lord_Nigh has joined #archiveteam-bs
05:34 🔗 Sk1d has joined #archiveteam-bs
05:42 🔗 JesseW https://archive.org/metadata/FourFiveSecondsRihannaKanyeWestPM4A32kbps <- an item whose identifier doesn't match the identifier stored in the metadata
05:42 🔗 JesseW sent in to info@
06:07 🔗 JesseW It looks like there may be some magic with IA identifiers that are the same as mediatypes (and are collections), in that they seem to include all the items with that mediatype, whether or not the item in question explicitly includes itself in the collection. If someone from IA wanted to confirm that, I'd be grateful.
07:01 🔗 HCross2 JW_work1 done it
07:04 🔗 metalcamp has joined #archiveteam-bs
07:04 🔗 JesseW HCross2: it didn't seem to take; I re-did it and it did then...
07:04 🔗 HCross2 Ah. Thanks
08:14 🔗 bwn has joined #archiveteam-bs
08:27 🔗 JesseW has quit IRC (Quit: Leaving.)
08:32 🔗 schbirid has joined #archiveteam-bs
09:12 🔗 kvieta has quit IRC (Read error: Operation timed out)
09:12 🔗 acridAxid has quit IRC (Read error: Operation timed out)
09:12 🔗 godane SketchCow: i'm up to 2009-07-31 for kpfa
09:13 🔗 godane i also have all of 2009-08 now
09:15 🔗 logchfoo2 starts logging #archiveteam-bs at Wed Feb 24 09:15:31 2016
09:15 🔗 logchfoo2 has joined #archiveteam-bs
09:17 🔗 beardicus has joined #archiveteam-bs
09:17 🔗 acridAxid has joined #archiveteam-bs
09:51 🔗 logchfoo1 starts logging #archiveteam-bs at Wed Feb 24 09:51:12 2016
09:51 🔗 logchfoo1 has joined #archiveteam-bs
10:21 🔗 Coderjoe has quit (hub.efnet.us ircd.choopa.net)
10:21 🔗 Stiletto has quit (hub.efnet.us ircd.choopa.net)
10:21 🔗 beardicus has quit (hub.efnet.us ircd.choopa.net)
10:21 🔗 botpie91 has quit (hub.efnet.us ircd.choopa.net)
10:21 🔗 closure has quit (hub.efnet.us ircd.choopa.net)
10:21 🔗 logchfoo1 has quit (hub.efnet.us ircd.choopa.net)
10:21 🔗 kvieta has quit (hub.efnet.us ircd.choopa.net)
10:21 🔗 mistym- has quit (hub.efnet.us ircd.choopa.net)
10:29 🔗 Coderjoe (~tward@[redacted]) has joined #archiveteam-bs
10:29 🔗 Stiletto (~Stiletto@[redacted]) has joined #archiveteam-bs
10:35 🔗 beardicus (~beardicus@[redacted]) has joined #archiveteam-bs
10:35 🔗 botpie91 (~botpie91@[redacted]) has joined #archiveteam-bs
10:35 🔗 closure (~lambda@[redacted]) has joined #archiveteam-bs
10:35 🔗 mistym- (~mistym@[redacted]) has joined #archiveteam-bs
10:35 🔗 kvieta (~kvieta@[redacted]) has joined #archiveteam-bs
10:56 🔗 VADemon (~VADemon@[redacted]) has joined #archiveteam-bs
12:16 🔗 godane (~slacker@[redacted]) has joined #archiveteam-bs
12:50 🔗 simpwork (~androirc@[redacted]) has joined #archiveteam-bs
13:33 🔗 godane so this is interesting
13:34 🔗 godane i'm going to be get a mp3 from kpfa thats 750mb
13:34 🔗 godane thats about 3 DAYS of audio in one mp3 file
13:38 🔗 godane this is the reason: Fund Drive Special – September 29, 2009
13:38 🔗 godane whats funny is the say the episode is no longer available: https://kpfa.org/archives/2009/9/29/
13:39 🔗 godane but i'm still able to grab everything
13:58 🔗 SmileyG Nice
14:51 🔗 metalcamp has quit (Ping timeout: 252 seconds)
15:23 🔗 Start has quit (Quit: Disconnected.)
15:58 🔗 Start (~Start@[redacted]) has joined #archiveteam-bs
16:27 🔗 Sk2d (~Sk1d@[redacted]) has joined #archiveteam-bs
16:33 🔗 Sk1d has quit (hub.se irc.du.se)
16:49 🔗 Sk2d is now known as Sk1d
16:55 🔗 Boppen (~Boppen@[redacted]) has joined #archiveteam-bs
16:56 🔗 Rotab (~Rotab@[redacted]) has joined #archiveteam-bs
17:04 🔗 Start has quit (Quit: Disconnected.)
17:04 🔗 JesseW (~jesse@[redacted]) has joined #archiveteam-bs
17:05 🔗 Boppen has quit (hub.se irc.du.se)
17:05 🔗 Rotab has quit (hub.se irc.du.se)
17:16 🔗 SadDM (~SadDM@[redacted]) has joined #archiveteam-bs
17:16 🔗 swebb gives channel operator status to SadDM
17:31 🔗 JesseW has quit (Quit: Leaving.)
17:53 🔗 mr-b has quit (Read error: Operation timed out)
17:53 🔗 dashcloud has quit (Read error: Operation timed out)
17:53 🔗 SadDM has quit (Read error: Operation timed out)
17:53 🔗 acridAxid has quit (Read error: Operation timed out)
17:54 🔗 Darkstar has quit (Read error: Operation timed out)
17:54 🔗 jspiros has quit (Read error: Operation timed out)
17:54 🔗 SN4T14 has quit (Read error: Operation timed out)
17:54 🔗 matthusby has quit (Ping timeout: 246 seconds)
17:54 🔗 simpwork has quit (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
17:55 🔗 chfoo- has quit (Ping timeout: 246 seconds)
17:55 🔗 remsen has quit (Ping timeout: 246 seconds)
17:55 🔗 rduser has quit (Ping timeout: 246 seconds)
17:58 🔗 dashcloud (~quassel@[redacted]) has joined #archiveteam-bs
18:02 🔗 acridAxid (~acridAxid@[redacted]) has joined #archiveteam-bs
18:02 🔗 remsen (~remsen@[redacted]) has joined #archiveteam-bs
18:02 🔗 SN4T14 (~SN4T14@[redacted]) has joined #archiveteam-bs
18:03 🔗 rduser (~rduser@[redacted]) has joined #archiveteam-bs
18:04 🔗 mr-b (~mr-b@[redacted]) has joined #archiveteam-bs
18:20 🔗 chfoo- (~chfooZnc@[redacted]) has joined #archiveteam-bs
18:24 🔗 SimpBrain just a quick poll, looking at getting a domain, these new domain extensions worth it?
18:28 🔗 snape No. (Though, full disclosure, I do own a .ninja domain, just because.)
18:32 🔗 snape is strictly old-school with TLDs - .com if you sell stuff, .net if you're an ISP, .org if you're not, and .biz if you're a spammer. /s
18:33 🔗 xmc nah spammers use .info because it's $3
18:38 🔗 snape Oh yeah, I forgot about .info. It's like the new .name, which was the new .me, which was the new .biz... stupid spammers. :/
18:40 🔗 metalcamp (~metalcamp@[redacted]) has joined #archiveteam-bs
18:44 🔗 SimpBrain yeah, have like 3 domains but these new domains tld's are tempting but it's only to address to mass hoarding of people buying and sitting on dead domains
18:51 🔗 snape Make sure it's a TLD that allows private/anon registration. I reg'd a no-private-registrations domain in my cat's name in 2008 and I still get spam (and physical junk mail) addressed to him from that today. Got some kind of fricking campaign flyer in the mail on Monday, actually. :/
18:52 🔗 JW_work1 SimpBrain: heck no — better the devel (er devil) you know.
18:53 🔗 acridAxid SketchCow: is there a tutorial or getting started guide for contributing ISOs to Archive.org's Shareware CD Archive?
18:54 🔗 Darkstar (~darkstar@[redacted]) has joined #archiveteam-bs
18:58 🔗 SimpBrain thx will avoid them
19:06 🔗 HCross I really want to see .stupid as a tld
19:06 🔗 HCross imagine .exe as a tld
19:06 🔗 schbirid argh https://gethttpsforfree.com/ does not support renewing :(
19:06 🔗 schbirid .scr
19:09 🔗 bwn has quit (Ping timeout: 250 seconds)
19:09 🔗 snape I think TLDs officially jumped the shark when we got .moe and .shiksa, nothing will surprise me anymore...
19:10 🔗 yipdw hey .moe is the best thing to happen to the internet since TCP/IP
19:14 🔗 yipdw I would also support .exe TLDs because then we could have rockman.exe and hell yeah
19:14 🔗 snape thinks .moe is one of the worst things to happen to the Internet since fingering users was a thing, but to each their own...
19:17 🔗 yipdw to be fair, the only .moe domain I really know of is llvm.moe
19:18 🔗 MrRadar There was also archive.moe which was a 4chan archive site
19:19 🔗 MrRadar That's actually a perfect use of that TLD, IMO
19:19 🔗 schbirid moe.moe.moe should be an invisiblecow clone with accent
19:21 🔗 yipdw ICANN and AR will make anime real
19:21 🔗 yipdw imagine the possibilities
19:21 🔗 yipdw all-year anime conventions
19:21 🔗 snape I don't think I've knowingly visited one, but I bet we can guess twenty working websites without even trying. desu.moe, pantsu.moe, baka.moe, oneesame.moe, tsundere.moe, waifu.moe, galge.moe... I'm sure at least three of those exist, probably more. Sigh.
19:21 🔗 yipdw I can see literally nothing wrong with this
19:22 🔗 schbirid wait, what does moe stand for?
19:22 🔗 SimpBrain is confused too
19:23 🔗 snape It's wapanese for "cute".
19:23 🔗 schbirid oh...
19:23 🔗 schbirid *cringe*
19:26 🔗 yipdw it is probably helpful to know that I have factored out the /s from everything I write
19:27 🔗 snape Yeah, I figured that out after the halfchan comment, lol.
19:28 🔗 HCross swebb, we should probably talk about al jazeera here
19:29 🔗 HCross Would youtube-dl work if I setup a grab site pipeline with it, and just used http://video.aljazeera.com/channels/eng etc?
19:39 🔗 bwn (~bwn@[redacted]) has joined #archiveteam-bs
19:40 🔗 snape Five of those seven .moe domains I guessed are registered. I'm not brave enough to see if there are websites, tho, lol.
19:40 🔗 HCross it seems to be doing it
19:46 🔗 yipdw oh I was hoping microsoft.moe would have chibized OS-tans
19:47 🔗 yipdw Microsoft, get on that
19:50 🔗 SketchCow acridAxid: What do you have and how many?
19:50 🔗 SketchCow I occasionally scan the general software pile to find ones people uploaded.
19:59 🔗 HCross can I throw things straight at the web collection?
20:01 🔗 swebb Not sure.
20:04 🔗 swebb I'm grabbing the Video site html too.
20:04 🔗 swebb 27M AlJazeeraVideo-20160224195632985
20:04 🔗 swebb 680M AlJazeera-20160224193701857
20:04 🔗 swebb I didn't include that one in the original crawl.
20:05 🔗 swebb Actually, it would be best to have them all together, huh? I should just put them all together.
20:06 🔗 HCross Do you have the BW to do so?
20:07 🔗 swebb Sure.
20:07 🔗 swebb but I can't crawl the video like you're doing.
20:08 🔗 swebb I was talking about the heritrix crawls. :)
20:08 🔗 swebb I've got 1Gbit to my home.
20:08 🔗 HCross better to have 2 copies of it, than no video
20:08 🔗 swebb I'm not sure how to set up the video crawl stuff that you're doing.
20:09 🔗 Start (~Start@[redacted]) has joined #archiveteam-bs
20:10 🔗 swebb The video site shows 11164 videos. :)
20:23 🔗 swebb What are you using to pipe output to the webpage like that?
20:24 🔗 HCross its all grab-site
20:24 🔗 HCross its the same system archivebot uses
20:26 🔗 metalcamp has quit (Ping timeout: 252 seconds)
20:38 🔗 yipdw wellkinda
21:01 🔗 bsmith093 (68f4c20a@[redacted]) has joined #archiveteam-bs
21:21 🔗 ivan` grab-site's ignores have diverged
21:24 🔗 HCross I know what my prob
21:25 🔗 HCross problem was, megawarc is for rsyncs in, not me running grab-site
21:44 🔗 HCross swebb, I just gor IPbanned from them
21:44 🔗 HCross got
21:44 🔗 swebb Crap
21:45 🔗 HCross and for some reason, grab site has decided to crawl my hosting provider's website
21:46 🔗 HCross Ill reboot it, and see what its done
21:46 🔗 swebb I'm not banned as far as I can tell
21:46 🔗 swebb You may have been hitting them too hard.
21:47 🔗 HCross Yea. Ill slow down
21:47 🔗 HCross ive rebooted, and they are happy and its downloading
22:00 🔗 schbirid has quit (Quit: Leaving)
22:00 🔗 HCross swebb, how are you getting the files to the IA?
22:01 🔗 swebb https://pypi.python.org/pypi/internetarchive
22:01 🔗 swebb Works really well.
22:02 🔗 HCross Yea. Can you just feed it a folder and say "upload all this"
22:02 🔗 swebb Yep
22:02 🔗 swebb you can overwrite files too
22:02 🔗 HCross ah cool. I was in the middle of writing some hugely complex script to do it, but then realised there must be a better way. Ive used that library before, forgot it exists
22:03 🔗 swebb I started uploading and then re-started it again and it just overwrote all of the old files with the same filename just as you would suspect. I was afraid that it would make new copies for dupe filenames.
22:03 🔗 HCross Ill wait, as with the videos, there is a large gap as it caches them, before it makes the warc
22:04 🔗 HCross ive got 2tb, and 100Mbps connection, so ill probably be able to upload in the morning
22:07 🔗 HCross ooh, swebb can you tell it to delete the file after its uploaded?
22:11 🔗 swebb I don't belive so.
22:12 🔗 HCross Thanks
22:13 🔗 swebb Add --metadata="mediatype:web" --metadata="collection:archiveteam" to your upload
22:13 🔗 swebb and --metadata="subject:archiveteam"
22:13 🔗 HCross Am I allowed to throw directly into the archiveteam collection?
22:14 🔗 arkiver no
22:14 🔗 arkiver Upload it to opensource and let SketchCow move it
22:14 🔗 HCross throught so, will do that arkiver
22:14 🔗 HCross thought
22:14 🔗 swebb oh, ok
22:27 🔗 SketchCow Yes
22:27 🔗 SketchCow (Do it that way)
22:59 🔗 JetBalsa (~jetbalsa@[redacted]) has joined #archiveteam-bs
23:15 🔗 Start_ (~Start@[redacted]) has joined #archiveteam-bs
23:15 🔗 Start has quit (Read error: Connection reset by peer)
23:17 🔗 ndiddy (~Nathan@[redacted]) has joined #archiveteam-bs
23:19 🔗 logan2 (~a@[redacted]) has joined #archiveteam-bs
23:20 🔗 logan has quit (Ping timeout: 252 seconds)
23:49 🔗 Fletcher HCross, internetarchive includes -delete
23:50 🔗 Fletcher or delete=True if you're importing the library
23:50 🔗 HCross Thanks
23:51 🔗 HCross Ideally I want to point it at a directory, and then say "upload all the warcs in all subfolders here. The metadata is xxx, and then delete after". I think ive got a script that does that
23:52 🔗 Fletcher absurdly easy if they're all going into one item, slightly harder if not
23:53 🔗 ndiddy has quit (Read error: Connection reset by peer)
23:53 🔗 HCross yea, its all in one. Easy as pie
23:54 🔗 HCross uploading 4 hour old files, so I know they are done
23:54 🔗 HCross https://harrycross.me/ecd.png
23:54 🔗 ndiddy (~Nathan@[redacted]) has joined #archiveteam-bs
23:54 🔗 HCross ^ that code is probably completely rubbish
23:57 🔗 snape I am slightly bemused that you found it easier to screenshot and upload that than just copy and paste it into a file...
23:58 🔗 HCross ive got a little tool on my taskbar, I just select the area I want to show, ittakes the shot, crops it, and uploads it
23:58 🔗 HCross and then copies the link to my clipboard

irclogger-viewer