#archiveteam-bs 2016-03-08,Tue

↑back Search

Time Nickname Message
00:00 🔗 yipdw FalconK: I think you can try it out just by uploading to the Community Texts collection
00:00 🔗 yipdw collection id is opensource, I think
00:01 🔗 yipdw that's open to anyone with an IA account
00:01 🔗 yipdw I can't remember if your account needs special privileges to upload with mediatype web, SketchCow would know more
00:02 🔗 dxrt Should be able to set mediatype to web on any standard account - from my experience.
00:10 🔗 MrRadar When I've tried to set mediatype through the web interface it has blocked me
00:10 🔗 MrRadar At least for setting it to web
00:11 🔗 SketchCow YPi cam
00:11 🔗 dxrt I do it through curl, no probs
00:11 🔗 SketchCow You can't load directly into web.
00:11 🔗 zenguy has quit IRC (Read error: Operation timed out)
00:12 🔗 dxrt collection opensource, mediatype web?
00:12 🔗 SketchCow Try, but I don't know
00:12 🔗 dxrt works for me at least.
00:15 🔗 JW_work there's a *web* collection ( https://archive.org/details/web ) which is different than the "web" mediatype . This is particularly confusing because there appears to be magic that makes the few items whose identifiers are the name of a mediatype also appear to contain all the items with that mediatype.
00:16 🔗 zenguy has joined #archiveteam-bs
00:24 🔗 JW_work it certainly looks like there is no restriction on giving things the web mediatype. See for example, this: https://archive.org/details/warc-files.tjw.moe
00:25 🔗 JW_work or even more so: https://archive.org/details/heckert_gnu_png
00:34 🔗 FalconK well, alright. I'll test it with opensource, and then submit a pull request.
00:35 🔗 FalconK I also have a pull request on the megawarc assembler - don't use cleartext HTTP and IA's authorization header at the same time!
00:35 🔗 FalconK unless you believe in passing cleartext passwords over networks ;)
00:35 🔗 ohhdemgir has quit IRC (Read error: Operation timed out)
00:36 🔗 w0rp has quit IRC (Read error: Operation timed out)
00:37 🔗 ohhdemgir has joined #archiveteam-bs
00:38 🔗 w0rp has joined #archiveteam-bs
01:16 🔗 Stiletto has quit IRC ()
01:17 🔗 Stiletto has joined #archiveteam-bs
02:20 🔗 zenguy has quit IRC (Read error: Operation timed out)
02:23 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:23 🔗 zenguy has joined #archiveteam-bs
02:27 🔗 dashcloud has joined #archiveteam-bs
02:51 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:55 🔗 dashcloud has joined #archiveteam-bs
03:11 🔗 JesseW has joined #archiveteam-bs
03:34 🔗 ErkDog http://puu.sh/nyMLP/74d28d17ac.png wheee
03:37 🔗 tomwsmf-a has joined #archiveteam-bs
03:39 🔗 Start has quit IRC (Read error: Connection reset by peer)
03:40 🔗 Start has joined #archiveteam-bs
03:40 🔗 JesseW has quit IRC (Quit: Leaving.)
03:47 🔗 Start has quit IRC (Quit: Disconnected.)
03:49 🔗 Start has joined #archiveteam-bs
03:57 🔗 bwn has quit IRC (Read error: Operation timed out)
04:04 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
04:17 🔗 JesseW has joined #archiveteam-bs
04:20 🔗 fie has quit IRC (Read error: Connection reset by peer)
04:20 🔗 bwn has joined #archiveteam-bs
04:34 🔗 ErkDog sadness :(
04:34 🔗 ErkDog http://puu.sh/nyPYB/16d1d24031.png
04:35 🔗 ErkDog http://puu.sh/nyQ1e/e493a50a76.png 17 hours to upload one work unit :(
04:49 🔗 yipdw are you still hung up on the rsync thing
04:50 🔗 xmc computerwise or ontologically
04:50 🔗 yipdw yes
05:12 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:19 🔗 Sk1d has joined #archiveteam-bs
06:02 🔗 Stiletto has quit IRC (Read error: Operation timed out)
06:02 🔗 Stiletto has joined #archiveteam-bs
06:17 🔗 Sk2d has joined #archiveteam-bs
06:22 🔗 Sk1d has quit IRC (hub.se irc.du.se)
06:32 🔗 FalconK ErkDog: the patch for that is in
06:33 🔗 FalconK just need to bust out the login creating and permission granting and server updating
06:33 🔗 FalconK as tempting as it is to direct all the ananiel pipeline stuff (full disk :/) to collection opensource
06:33 🔗 FalconK also, I tried to commit into archivebot, and yes, one needs the blessing
06:34 🔗 FalconK SketchCow: WTB 1x commit access to collection archivebot for user FalconK
06:37 🔗 Sk2d is now known as Sk1d
06:43 🔗 yipdw FalconK: ErkDog's thing looks like a Warrior project, which doesn't use the ArchiveBot uploader
06:52 🔗 metalcamp has joined #archiveteam-bs
06:57 🔗 JesseW I don't think there's anything wrong with putting the stuff inthe opensource collection.
06:57 🔗 JesseW It can be moved later.
06:57 🔗 xmc FalconK: right. you should add a tag or whatever it's called 'archivebot'
06:59 🔗 JesseW I think tags are called "subject"
07:00 🔗 xmc maybe
07:00 🔗 xmc or keywords
07:07 🔗 vitzli has joined #archiveteam-bs
07:16 🔗 mismatch_ has quit IRC (Remote host closed the connection)
07:17 🔗 mismatch_ has joined #archiveteam-bs
07:26 🔗 JesseW has quit IRC (Quit: Leaving.)
08:24 🔗 schbirid has joined #archiveteam-bs
08:36 🔗 bwn has quit IRC (Read error: Operation timed out)
08:52 🔗 Boppen has joined #archiveteam-bs
09:02 🔗 bwn has joined #archiveteam-bs
09:10 🔗 godane has quit IRC (Read error: Operation timed out)
09:33 🔗 godane has joined #archiveteam-bs
10:12 🔗 metalcamp has quit IRC (Ping timeout: 258 seconds)
11:20 🔗 jspiros has quit IRC (leaving)
11:26 🔗 jspiros has joined #archiveteam-bs
11:47 🔗 metalcamp has joined #archiveteam-bs
11:49 🔗 godane has quit IRC (Quit: Leaving.)
12:15 🔗 metalcamp has quit IRC (Ping timeout: 258 seconds)
12:19 🔗 godane has joined #archiveteam-bs
12:21 🔗 RichardG has quit IRC (Read error: Operation timed out)
12:47 🔗 Smiley midas: for some reason it didn't show up lols
12:55 🔗 Smiley has quit IRC (Remote host closed the connection)
13:02 🔗 Smiley has joined #archiveteam-bs
13:42 🔗 RichardG has joined #archiveteam-bs
14:18 🔗 wacky has joined #archiveteam-bs
14:54 🔗 pgoetz has quit IRC (Remote host closed the connection)
15:01 🔗 Start has quit IRC (Quit: Disconnected.)
15:03 🔗 godane has quit IRC (Read error: Operation timed out)
15:04 🔗 w0rp has quit IRC (Read error: Operation timed out)
15:04 🔗 closure has quit IRC (Read error: Operation timed out)
15:04 🔗 godane has joined #archiveteam-bs
15:04 🔗 beardicus has quit IRC (Read error: Operation timed out)
15:05 🔗 closure has joined #archiveteam-bs
15:05 🔗 midas sets mode: +o closure
15:05 🔗 beardicus has joined #archiveteam-bs
15:06 🔗 w0rp has joined #archiveteam-bs
15:20 🔗 pgoetz has joined #archiveteam-bs
15:29 🔗 SketchCow I got a new Ultra-High-Def monitor, so you're all doomed.
15:29 🔗 SketchCow I see EVERYTHING
15:30 🔗 midas has ultra-high-def monitor, still runs mame at 640x480
15:44 🔗 Start has joined #archiveteam-bs
16:15 🔗 pgoetz has quit IRC (Remote host closed the connection)
16:18 🔗 ersi has quit IRC (Read error: Operation timed out)
16:20 🔗 ersi has joined #archiveteam-bs
16:20 🔗 midas sets mode: +o ersi
16:20 🔗 swebb sets mode: +o ersi
16:48 🔗 JesseW has joined #archiveteam-bs
16:49 🔗 pgoetz has joined #archiveteam-bs
17:07 🔗 Start has quit IRC (Quit: Disconnected.)
17:11 🔗 JesseW has quit IRC (Quit: Leaving.)
17:11 🔗 vitzli has quit IRC (Leaving)
17:16 🔗 metalcamp has joined #archiveteam-bs
17:46 🔗 SimpBrain wow scaleway not mucking about with cloud server prices
18:13 🔗 ErkDog holy crap yeah
18:13 🔗 HCross Yea, but their network speed isnt good
18:13 🔗 ErkDog cause of the 300Mbit?
18:14 🔗 HCross nope, because they oversell
18:15 🔗 ErkDog ahhh so saturated
18:15 🔗 Frogging overselling can hit I/O and CPU performance too
18:15 🔗 schbirid has quit IRC (Quit: Leaving)
18:15 🔗 ErkDog LOL plus they publically advertise hey run torrents
18:15 🔗 ErkDog https://www.scaleway.com/imagehub/torrents/
18:16 🔗 SimpBrain good for private sites
18:16 🔗 ErkDog yeah unmetered servers, run torrents, that will make a good experience for all
18:16 🔗 Frogging cloud to butt is fun
18:16 🔗 Frogging http://archiveteam.org/index.php?title=User_talk:Jscott
18:17 🔗 Frogging "This is partly "fuck my butt" and partly "archive team" related"
18:17 🔗 SimpBrain saturated pipes everywhere
18:25 🔗 tomwsmf-a has joined #archiveteam-bs
18:28 🔗 joepie91 SimpBrain: how dare you use the word 'cloud'
18:28 🔗 joepie91 ;)
18:28 🔗 SimpBrain well it's not a physical dedicated server :P
18:29 🔗 Frogging butt server
18:30 🔗 SimpBrain tbh going into the future, it should be like cloud dedis especially for tiny companies and individuals, why do you physically need something physical to say it's yours,
18:33 🔗 Frogging That's pretty much the way it is already. But dedicated physical servers have advantages, such as not sharing system resources with other users, and you can usually get a whole hard disk to yuorself
18:33 🔗 Frogging yourself*
18:34 🔗 Frogging I have a VPS and a dedicated server, because sometimes I need more than 24GB of disk space and I don't want to pay $100/month for a higher VPS tier
18:35 🔗 SimpBrain yeah hdd space is what is killing vps for small time use
18:36 🔗 yipdw dedicated physical server is so nice because I can be super-lazy in my Xen allocations and not give a shit
18:36 🔗 yipdw "how much for gitlab? fuck it, 8 gigs"
18:37 🔗 Frogging VPS definitely has its place though. They're very flexible and scalable
18:38 🔗 Frogging as with most things, it's not black-and-white "doing it this way is unquestionably better at everything"
18:38 🔗 yipdw these days I read "flexible" as "fuck you, do it yourself" and "scalable" as "fuck you, pay us more for more nodes"
18:39 🔗 yipdw if you're on EC2 both are literally that
18:39 🔗 Frogging yipdw: Hah. Yeah it's a bit buzzwordy
18:39 🔗 xmc hahaha yes
18:39 🔗 joepie91 [19:28] <SimpBrain> well it's not a physical dedicated server :P
18:39 🔗 Frogging But I more meant that you can start an instance and do some stuff and then get rid of it without paying a setup fee up front
18:39 🔗 joepie91 scaleway? it absolutely is
18:39 🔗 joepie91 the ARM pxes anyway
18:39 🔗 joepie91 er
18:39 🔗 joepie91 boxes
18:40 🔗 joepie91 yep
18:40 🔗 Frogging If I want to test some shit on a clean system with a clean connection, I just click "new Linode"
18:40 🔗 yipdw I am also very annoyed at tracking down this one memory leak that is causing a load balancer to trigger scaling notifications which is causing an autoscaling group to go haywire
18:40 🔗 joepie91 still ARM boxes
18:40 🔗 yipdw so I am probably biased
18:40 🔗 joepie91 SimpBrain: anyhow, "cloud" doesn't mean anything anyway
18:40 🔗 joepie91 it's either a physical server, or a VM, and it might have hourly billing, or have an API for spinning them up
18:40 🔗 SimpBrain yeah
18:40 🔗 joepie91 or have high availability
18:40 🔗 joepie91 or geographic redundancy
18:40 🔗 joepie91 or a SAN
18:40 🔗 joepie91 and any of these things might be indicated with 'cloud'
18:40 🔗 Frogging joepie91: nah man
18:40 🔗 joepie91 in any combination
18:40 🔗 schbirid has joined #archiveteam-bs
18:40 🔗 joepie91 :p
18:40 🔗 Frogging it's literally in the clouds
18:40 🔗 Frogging there's nothing physical about it
18:41 🔗 joepie91 it's a meaningless buzzword basically
18:41 🔗 Frogging to be fair, it has some degree of meaning. Unlike "internet of things"
18:41 🔗 joepie91 no, it really doesn't
18:41 🔗 yipdw I store my files in a bong
18:41 🔗 yipdw personal cloud
18:42 🔗 Frogging i store my files in my butt
18:42 🔗 yipdw anyway I don't know where this conversation started, what is it about
18:42 🔗 Frogging don't remember :p
18:42 🔗 * Frogging scrolls up
18:43 🔗 Frogging SimpBrain said something about Scaleway
18:43 🔗 ErkDog FOS Makes me so sad :(
18:43 🔗 Frogging [12:46:43] <@SimpBrain> wow scaleway not mucking about with cloud server prices
18:43 🔗 ErkDog 80Kbps :(
18:43 🔗 yipdw fos has served us all well for years
18:43 🔗 * SimpBrain hides
18:43 🔗 ErkDog I've got 45G of data waiting to by dumped.... :-/
18:44 🔗 ErkDog like Wiki and GameTracker would be done if we could dump it somewhere, lol
18:44 🔗 ErkDog or at least "caught up"
18:47 🔗 SimpBrain gametrailers really hit fos hard
18:47 🔗 SimpBrain didnt help we was archiving 4 sites at the time i think
18:47 🔗 ErkDog lol gametrailers is a massive amount of data
18:47 🔗 yipdw fos is not getting slammed like it was, maybe there's been some controls put on it
18:48 🔗 yipdw anyway the fos-to-ErkDog connection doesn't seem like the best either https://gist.github.com/yipdw/07994326c74c7ffa16e6
18:48 🔗 ErkDog well I get 80K/sec here and about 125 from the server I am using
18:48 🔗 ErkDog skyrim.towfowi.net
18:48 🔗 SketchCow I'm going to revisit FOS and its connection when I get there.
18:48 🔗 ErkDog ohhhh, yeah that's the trace you did
18:48 🔗 yipdw it could be either end, I think blaming it on fos is premature
18:48 🔗 ErkDog i'm on ha.wa.ecansol.net
18:49 🔗 SketchCow No, FOS is definitely doing something.
18:49 🔗 SketchCow Something bad.
18:49 🔗 ErkDog poor FOSy :(
18:49 🔗 ErkDog or BAD FOSy whatever the case may be ;-D
18:49 🔗 SketchCow Part of it, of course, are the people going "Oh, it's not working fast, LET ME PUT 20 SIMULTANEOUS CONNECTIONS ON IT THAT WILL FIX IT"
18:49 🔗 phuzion Yeah, I'm hovering between 115 and 130KB/s going to FOS.
18:49 🔗 SketchCow Luckily I can't tell who does this, which is why they are still alive
18:49 🔗 ErkDog yeah cause they don't understand the idea of IO thrashing
18:49 🔗 ErkDog SketchCow you should be able to
18:49 🔗 ErkDog an incoming rsynch shows as a process doesn't ?
18:50 🔗 SketchCow No, if I do it, I'll just start murdering you fucks
18:50 🔗 SketchCow All of you
18:50 🔗 ErkDog LOL
18:50 🔗 ErkDog well you can only complain so much, if you want to download all the internet, you have to give us a place to put it bro
18:50 🔗 xmc eep
18:50 🔗 SketchCow It'll be me and a room of corpses and me with a machete going "good meeting, good meeting" and chewing a sour patch kid
18:50 🔗 phuzion hahaha
18:51 🔗 ErkDog netstat -alnp|grep #### where ### is the port of your incoming rsynch connections will tell you too
18:51 🔗 ErkDog at least it would tell you the # of connections from that IP, but not who owns it
18:51 🔗 ErkDog but you could firewall off people who have 1,000 processes running and when they ask why they can't upload stuff, we can explain to them, that they need ONE process per Project, per server, at -most-
18:52 🔗 yipdw I considered doing that and it is much harder to maintain than just finding the people and asking them to back off a bit
18:53 🔗 ErkDog couldn't you just tell the rsync/ssh protocol to only allow 2 connections per IP?
18:53 🔗 yipdw yes but it's not a relevant defense
18:54 🔗 yipdw not when you have some people who have access to large subnets and are running warriors on all of them
18:54 🔗 ErkDog true
18:54 🔗 ErkDog but when I look at the trackers
18:54 🔗 yipdw anyway, Atluxity is running a lot of traffic to fotolog
18:54 🔗 ErkDog I only see like 5 or so people active on any given project
18:54 🔗 yipdw yeah it's one person with a large number of nodes
18:55 🔗 Frogging so it's people running a bunch of warriors on one machine that's hammering FOS?
18:55 🔗 yipdw many warriors on many machines
18:55 🔗 Frogging Is more warriors not better?
18:55 🔗 Frogging Or are they doing it wrong
18:55 🔗 yipdw more warriors is fine but there are limits to how fast we can take stuff in
18:55 🔗 yipdw this is just a limit
18:55 🔗 yipdw find why and work around it, etc
18:56 🔗 SketchCow I'm going to reboot the box.
18:56 🔗 yipdw I also hate the word "scalable" because it gets people excited for no fucking reasn
18:56 🔗 SketchCow I do see that the upload speed just skyrocketed.
18:56 🔗 Frogging Perhaps the system could be adjusted so that FOS coodinates who is uploading what and when
18:57 🔗 yipdw SketchCow: you might want to hold off, it looks like DFJustin's doing a compile
18:57 🔗 ErkDog well likely the bottleneck is disk I/O
18:57 🔗 SketchCow He is ALWAYS doing a compile
18:57 🔗 yipdw oh ok never mind
18:57 🔗 SketchCow STOP BEING MY MECHANICS FOR A MOMENT
18:57 🔗 SketchCow I have two torrents going on the box, I'm trying to shut them down and avoid living a pile of buff
18:58 🔗 Frogging Maybe instead of warriors uploading things ASAP they could upload when FOS asks them to, to limit load
18:58 🔗 ErkDog because as you add additional incoming rsynchs, the spead of all the existing transfers is diminished significantly, so 10 RSynchs take more than 10 times as long to complete as a single RSynch
18:58 🔗 yipdw so
18:59 🔗 SketchCow No, no. The problem is just a matter of the fact that the machine got extended at one point and it never, ever goes back.
18:59 🔗 SketchCow And then people "do things"
18:59 🔗 SketchCow I wish I knew the command in rtorrent to say "and delete the data"
18:59 🔗 Frogging https://www.youtube.com/watch?v=EHybN9UbhWM
19:01 🔗 * ersi scales yipdw
19:03 🔗 ErkDog If you want to delete data on remove I would suggest adding the below to your rtorrent.rc. It will be both faster and more robust than rutorrent's delete function (which relies on php and a forked process) and has the benefit of not crashing rtorrent since it remembers state instead.
19:03 🔗 ErkDog method.set_key = event.download.erased, remove_file,"execute={rm,-drf,--,$d.get_base_path=}"
19:03 🔗 PurpleSym rsyncd is able to execute a script before starting a transfer. One could check the current load and stop the transfer if it is too high.
19:05 🔗 SketchCow Ha ha ha
19:05 🔗 SketchCow HEY GUESS WHAT GUYS
19:05 🔗 SketchCow I just found out there's a scheduled reboot of FOS anyway at 7pm EST
19:08 🔗 bwn has quit IRC (Read error: Operation timed out)
19:09 🔗 xmc bhahaha
19:12 🔗 ErkDog lol
19:22 🔗 Start has joined #archiveteam-bs
19:27 🔗 bwn has joined #archiveteam-bs
19:48 🔗 SketchCow OK, so I'm going to see about shutting down my torrenting, cleaning up a few things, and then we get the reboot
19:59 🔗 SN4T14 has quit IRC (Remote host closed the connection)
20:02 🔗 SN4T14 has joined #archiveteam-bs
20:23 🔗 wacky Don't suppose anyone from the IA could gimme 5 min of time to hit a few questions off of them
20:24 🔗 ErkDog sweet SketchCow thanks :-D
20:26 🔗 JW_work wacky: toss your questions here — the worst that will happen is none of us will know or be willing to answer.
20:29 🔗 wacky I work for a commercial archiving solution, we have a client (end user who owns the originally archived content) who is looking to get some content, them as the original content owner is it possible to get a warc/warc export?
20:29 🔗 wacky They would have no problem paying for such a service
20:29 🔗 MrRadar That's a question that would need to be addressed to the IA directly.
20:30 🔗 MrRadar If it's content from the IA's Wayback Machine
20:30 🔗 JW_work I'd suggest sending that question to info@archive.org, providing (in the initial email) the specific URLs you are interested in, and whatever proof you have that you represent the original content owner. I have no idea whether that would be feasible, but it seems reasonable to me.
20:32 🔗 MrRadar If it's something that we (the Archiveteam) archived then the WARCs should already be available for download from the IA
20:34 🔗 JW_work good point. You can look up archivebot stuff with the viewer; for other stuff … probably search the wiki to see if it was a project.
20:35 🔗 MrRadar For reference, the ArchiveBot viewer is here: http://archive.fart.website/archivebot/viewer/
20:37 🔗 wacky Awesome - thanks all! Ill give the suggestions a shot
20:38 🔗 JW_work cool, glad we were able to give you some pointers
20:39 🔗 ErkDog sigh fart.website
20:40 🔗 ErkDog lol a lot of the things archivebot is working on don't seem like "small" websites
20:40 🔗 ErkDog one is @ 54 gigs, lol
20:41 🔗 MrRadar Scroll down to the bottom of the dashboard to see some *really* big jobs
20:41 🔗 ErkDog yeah lol one is 100 gigs, that one is 999 gigs?
20:44 🔗 Start has quit IRC (Quit: Disconnected.)
20:46 🔗 SketchCow 54gb is small
20:49 🔗 phuzion 54gb is tiny. I have a flash drive with more than 54gb of usable capacity.
20:49 🔗 phuzion Actually, I have like 3 or 4 laying around.
20:50 🔗 ErkDog LOL well I guess it depends on how you look at it
20:50 🔗 ErkDog since -most- websites are like super tiny compared to that
20:50 🔗 ErkDog we run a hosting company
20:50 🔗 ErkDog our customer's largest site is 1.5 gigs, and it's eCommerce
20:51 🔗 MrRadar Keep in mind that the ArchiveBot saves web requests not necessarily what would be stored on the server
20:51 🔗 MrRadar If you had a PHP script that printed an endless stream of random numbers that would be small on disk but the response would be huge
20:52 🔗 MrRadar For full-site grabs we also tend to target sites that have lots of interesting stuff to save
20:55 🔗 ErkDog hmmm true
21:04 🔗 schbirid has quit IRC (Quit: Leaving)
21:35 🔗 FalconK SketchCow: so do you want me to upload things to opensource with a special tag? or somewhere else?
21:41 🔗 FalconK meh whatever I'll just upload them with subject: archivebot for now and we can always make more changes if desirable.
21:43 🔗 VADemon has joined #archiveteam-bs
21:45 🔗 xmc yep
21:45 🔗 xmc as long as they're separable from everything else
21:52 🔗 arkiver SketchCow: any taks this year in the Netherlands?
21:52 🔗 arkiver talks*
21:53 🔗 SketchCow None planned, but then again this is the year I planned for not doing much speaking/travel except the Japan trip
21:54 🔗 godane i figure a telethon at the end of the year at IA
21:57 🔗 fie has joined #archiveteam-bs
21:57 🔗 FalconK ok, much, much better
21:57 🔗 FalconK getting 5 mbit up into IA
21:57 🔗 FalconK the uploads are collection: opensource, subject: archivebot
21:58 🔗 FalconK content-type: web
21:58 🔗 FalconK who moves them?
22:00 🔗 yipdw if you can hold off the uploads until we can get that sorted out, that'd be nice
22:00 🔗 yipdw I don't think the viewer will find those
22:00 🔗 yipdw (until they get in the right place)
22:07 🔗 metalcamp has quit IRC (Ping timeout: 258 seconds)
22:10 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:14 🔗 dashcloud has joined #archiveteam-bs
22:15 🔗 FalconK argh, since I already started, I can't.
22:16 🔗 xmc no worries
22:16 🔗 xmc items can always be moved
22:16 🔗 xmc it's easy
22:17 🔗 FalconK it looks like mostly a metadata change yes
22:17 🔗 xmc yup
22:17 🔗 xmc metamgr can do it i think?
22:17 🔗 FalconK on the bright side, my pipeline is emptying out now and actually crawling things again
22:18 🔗 FalconK so
22:18 🔗 FalconK who are the individuals that are needed to sort it out?
22:18 🔗 xmc what's your IA account email address?
22:18 🔗 FalconK falcon@falconk.rocks
22:18 🔗 xmc nerd
22:19 🔗 xmc https://archive.org/details/archiveteam_archivebot_go_falconk_test_20160307www_youtube_com_20160306 this thingy
22:19 🔗 xmc your item name is kind of fucky
22:19 🔗 FalconK yes, that was the test item
22:19 🔗 xmc ah
22:20 🔗 FalconK there is another, which was just uploaded, but isn't showing up under my uploads page
22:20 🔗 FalconK though I recall it taking a moment
22:21 🔗 yipdw oh, right, that's the main problem with distributed upload
22:21 🔗 yipdw s
22:21 🔗 yipdw naming
22:21 🔗 xmc can you get to metamgr with your account http://archive.org/metamgr.php?&w_uploader=falcon@falconk.rocks
22:22 🔗 dxrt Just my 2c on this whole thing -I don't really want all the random crap my pipeline has grabbed to show up under my user account and linked to me - especially if something questionable is discovered later, it kind of seems like it'll be my liability and 'my upload' rather than the current system.
22:23 🔗 FalconK xmc: not authorized
22:23 🔗 xmc ok
22:23 🔗 yipdw dxrt: as far as I can tell, the rsync mode still exists
22:23 🔗 FalconK yes
22:23 🔗 FalconK this change is very optional
22:23 🔗 yipdw I am however wondering how to name these items
22:23 🔗 yipdw the time-sequence thing doesn't work anymore
22:23 🔗 yipdw and UUID is not a solution
22:23 🔗 dxrt Right! I thought it was a current re-work of the current uploader, but I'm happy to hear that!
22:23 🔗 xmc archivebot_username_date ?
22:24 🔗 yipdw maybe, assuming username keeps all their clocks in sync
22:24 🔗 FalconK so the way I am naming then now is like archiveteam_archivebot_go_falconk_content_radiosega_net_20160307
22:24 🔗 xmc well is it a problem to put them in somewhat incorrect items
22:24 🔗 xmc because timestamps exist in the datas
22:24 🔗 FalconK for a crawl of content.radiosega.net which the crawler named with 20160307 as the date in the filename
22:24 🔗 arkiver SketchCow: ok, the little archiveteam meeting last year was nice. We got some new project out of it too
22:24 🔗 xmc i thought you were doing one item per day per pipeliner
22:25 🔗 FalconK well I thought of doing that and then I wondered why I was associating items which had no logical association except that they were gathered proximally
22:26 🔗 FalconK I mean the item name is pretty arbitrary right?
22:26 🔗 xmc yes
22:26 🔗 xmc it comes down to semantics really
22:26 🔗 xmc i guess there's nothing wrong with item per archivebot job
22:26 🔗 yipdw yeah I guess in the end I'm ok with that
22:26 🔗 xmc but we do a bunch of single-page grabs too
22:26 🔗 * xmc shrug
22:27 🔗 yipdw provided the viewer can find them
22:27 🔗 * yipdw checks
22:27 🔗 FalconK there would be something wrong with one item per 5gb chunk
22:27 🔗 FalconK I doubt the viewer will find them until they are moved into a blessed collection
22:27 🔗 xmc an item per job, containing one or many warcs
22:27 🔗 FalconK they're in opensource with type web currently
22:27 🔗 xmc sounds good to me
22:27 🔗 yipdw oh, I meant that the viewer doesn't add additional criteria on top of collection
22:27 🔗 yipdw like /[0-9]+/
22:27 🔗 FalconK oh
22:27 🔗 yipdw I think the answer is no? but I haven't checked
22:27 🔗 FalconK I hope it doesn't!
22:28 🔗 yipdw ok the answer is probably "it's fine"
22:28 🔗 FalconK it would be good to know but I have no way to verify
22:28 🔗 FalconK cool
22:28 🔗 FalconK I can help with whatever bulk crap needs doing as a result of this
22:28 🔗 yipdw at least https://github.com/ArchiveTeam/ArchiveBot/blob/master/viewer/archivebotviewer/database.py#L417, to me, indicates that we're clear
22:29 🔗 FalconK archivebot identifiers already have _, and I am doing some string translation
22:29 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
22:29 🔗 yipdw it just occurred to me because we do have some tools that do that check
22:30 🔗 FalconK the translation is re.sub(r'[^0-9a-zA-Z-]+', '_', basename)
22:30 🔗 FalconK so DNS characters or _
22:30 🔗 ndiddy has joined #archiveteam-bs
22:30 🔗 yipdw yeah those'll be fine
22:30 🔗 yipdw AFAICT
22:30 🔗 FalconK :)
22:30 🔗 FalconK if not, we'll see.
22:32 🔗 FalconK and... wow, I have made a thing that uploads over 1GB per hour of internet into the archive.
22:32 🔗 yipdw nice
22:32 🔗 * FalconK enjoys this
22:32 🔗 yipdw the only other place I've seen that is on another rsync target we have
22:33 🔗 yipdw it's Kenshin's
22:33 🔗 FalconK well actually 1GB per 15min
22:33 🔗 yipdw as far as I can tell Kenshin basically owns Singapore
22:33 🔗 FalconK this just has 1gbps upstream
22:33 🔗 FalconK nothing special about it besides that
22:34 🔗 FalconK the transfer rate is really, really fluttery though
22:34 🔗 FalconK they end up looking like this:
22:34 🔗 FalconK https://archive.org/details/archiveteam_archivebot_go_falconk_content_radiosega_net_20160307
22:36 🔗 FalconK the non-viewability seems to be common to WARCs in opensource
22:36 🔗 yipdw yeah
22:37 🔗 yipdw the WARC also doesn't have extension .warc.gz for some reason
22:37 🔗 FalconK it is _warc_gz
22:37 🔗 FalconK hmm.
22:37 🔗 FalconK is that my doing?
22:38 🔗 yipdw it could be; IIRC wpull does .warc.gz
22:38 🔗 FalconK yes, it is my doing
22:38 🔗 yipdw I don't think it matters for derives (though maybe it does), but it can matter for browser downloads
22:38 🔗 yipdw and etc
22:38 🔗 FalconK let me fix that.
22:38 🔗 arkiver it matters for derives
22:38 🔗 yipdw oh
22:42 🔗 FalconK fixed. targets are now like /archiveteam_archivebot_go_falconk_content_radiosega_net_20160307/content.radiosega.net-inf-20160307-051602-1qvpq-00001.warc.gz
22:42 🔗 yipdw cool
22:42 🔗 FalconK now is there some way to rename the one extant misnamed file
22:44 🔗 FalconK ... probably not.
22:44 🔗 FalconK not by me anyway.
23:16 🔗 VADemon Does anyone know, is 1GB softlimit per WARC file still recommended for mirrors or should it be raised?
23:21 🔗 ErkDog ftp ftp RSynch target is fast
23:21 🔗 ErkDog the*
23:21 🔗 ErkDog I can dump @ 15M/sec from 2 different systems all day
23:28 🔗 FalconK ** rsync
23:29 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
23:30 🔗 ErkDog so you made it so you can upload directly into IA FalconK instead of having to rsynch it somewhere?
23:37 🔗 xXx_ndidd has joined #archiveteam-bs
23:38 🔗 fie_ has joined #archiveteam-bs
23:38 🔗 hawc145 has joined #archiveteam-bs
23:39 🔗 RichardG_ has joined #archiveteam-bs
23:42 🔗 phuz has joined #archiveteam-bs
23:42 🔗 Start has joined #archiveteam-bs
23:42 🔗 is-_ has joined #archiveteam-bs
23:42 🔗 ndiddy has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 dashcloud has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 fie has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 RichardG has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 ohhdemgir has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 yipdw has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 signius has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 HCross has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 ErkDog has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 chfoo has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 toad1 has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 JW_work has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 phuzion has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 is- has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 MrRadar has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 chazchaz has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 Laverne has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 SimpBrain has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 zino_ has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 Infreq has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 Darkstar has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 slyphic has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 Frogging has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 dcmorton has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 Cameron_D has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 dxrt has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 atlogbot has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 swebb has quit IRC (hub.efnet.us irc.servercentral.net)
23:42 🔗 Famicoma1 has quit IRC (Ping timeout: 270 seconds)
23:43 🔗 chazchaz_ has joined #archiveteam-bs
23:44 🔗 yipdw_ has joined #archiveteam-bs
23:44 🔗 * FalconK looks at the sadness that is efnet
23:44 🔗 dxrt_ has joined #archiveteam-bs
23:45 🔗 Infreq_ has joined #archiveteam-bs
23:45 🔗 ErkDog_ has joined #archiveteam-bs
23:45 🔗 swebb_ has joined #archiveteam-bs
23:45 🔗 Frogging_ has joined #archiveteam-bs
23:45 🔗 chfoo0 has joined #archiveteam-bs
23:46 🔗 zino__ has joined #archiveteam-bs
23:46 🔗 SimpBrai1 has joined #archiveteam-bs
23:48 🔗 FalconK ErkDog_: yes, I did.
23:48 🔗 FalconK (so we can forget that the correct spelling of the project name is rsync, or that it even exists, perhaps...) ;)
23:51 🔗 pi has joined #archiveteam-bs
23:55 🔗 pi is now known as MrRadar_
23:56 🔗 ErkDog_ lol
23:56 🔗 ErkDog_ soz :-D
23:57 🔗 ErkDog_ is now known as ErkDog
23:57 🔗 dashcloud has joined #archiveteam-bs
23:57 🔗 swebb_ is now known as swebb
23:57 🔗 Frogging_ is now known as Frogging
23:58 🔗 JW_work has joined #archiveteam-bs
23:58 🔗 MrRadar_ is now known as MrRadar
23:59 🔗 toad1 has joined #archiveteam-bs
23:59 🔗 slyphic has joined #archiveteam-bs

irclogger-viewer