#archiveteam-bs 2016-03-08,Tue

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
yipdwFalconK: I think you can try it out just by uploading to the Community Texts collection
collection id is opensource, I think
that's open to anyone with an IA account
I can't remember if your account needs special privileges to upload with mediatype web, SketchCow would know more
[00:00]
dxrtShould be able to set mediatype to web on any standard account - from my experience. [00:02]
MrRadarWhen I've tried to set mediatype through the web interface it has blocked me
At least for setting it to web
[00:10]
SketchCowYPi cam [00:11]
dxrtI do it through curl, no probs [00:11]
SketchCowYou can't load directly into web. [00:11]
***zenguy has quit IRC (Read error: Operation timed out) [00:11]
dxrtcollection opensource, mediatype web? [00:12]
SketchCowTry, but I don't know [00:12]
dxrtworks for me at least. [00:12]
JW_workthere's a *web* collection ( https://archive.org/details/web ) which is different than the "web" mediatype . This is particularly confusing because there appears to be magic that makes the few items whose identifiers are the name of a mediatype also appear to contain all the items with that mediatype. [00:15]
***zenguy has joined #archiveteam-bs [00:16]
JW_workit certainly looks like there is no restriction on giving things the web mediatype. See for example, this: https://archive.org/details/warc-files.tjw.moe
or even more so: https://archive.org/details/heckert_gnu_png
[00:24]
FalconKwell, alright. I'll test it with opensource, and then submit a pull request.
I also have a pull request on the megawarc assembler - don't use cleartext HTTP and IA's authorization header at the same time!
unless you believe in passing cleartext passwords over networks ;)
[00:34]
***ohhdemgir has quit IRC (Read error: Operation timed out)
w0rp has quit IRC (Read error: Operation timed out)
ohhdemgir has joined #archiveteam-bs
w0rp has joined #archiveteam-bs
[00:35]
........ (idle for 38mn)
Stiletto has quit IRC ()
Stiletto has joined #archiveteam-bs
[01:16]
............. (idle for 1h3mn)
zenguy has quit IRC (Read error: Operation timed out)
dashcloud has quit IRC (Read error: Operation timed out)
zenguy has joined #archiveteam-bs
dashcloud has joined #archiveteam-bs
[02:20]
..... (idle for 24mn)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[02:51]
.... (idle for 16mn)
JesseW has joined #archiveteam-bs [03:11]
..... (idle for 23mn)
ErkDoghttp://puu.sh/nyMLP/74d28d17ac.png wheee [03:34]
***tomwsmf-a has joined #archiveteam-bs
Start has quit IRC (Read error: Connection reset by peer)
Start has joined #archiveteam-bs
JesseW has quit IRC (Quit: Leaving.)
[03:37]
Start has quit IRC (Quit: Disconnected.)
Start has joined #archiveteam-bs
[03:47]
bwn has quit IRC (Read error: Operation timed out) [03:57]
tomwsmf-a has quit IRC (Read error: Operation timed out) [04:04]
JesseW has joined #archiveteam-bs
fie has quit IRC (Read error: Connection reset by peer)
bwn has joined #archiveteam-bs
[04:17]
ErkDogsadness :(
http://puu.sh/nyPYB/16d1d24031.png
http://puu.sh/nyQ1e/e493a50a76.png 17 hours to upload one work unit :(
[04:34]
yipdware you still hung up on the rsync thing [04:49]
xmccomputerwise or ontologically [04:50]
yipdwyes [04:50]
..... (idle for 22mn)
***Sk1d has quit IRC (Ping timeout: 250 seconds) [05:12]
Sk1d has joined #archiveteam-bs [05:19]
......... (idle for 43mn)
Stiletto has quit IRC (Read error: Operation timed out)
Stiletto has joined #archiveteam-bs
[06:02]
.... (idle for 15mn)
Sk2d has joined #archiveteam-bs [06:17]
Sk1d has quit IRC (hub.se irc.du.se) [06:22]
FalconKErkDog: the patch for that is in
just need to bust out the login creating and permission granting and server updating
as tempting as it is to direct all the ananiel pipeline stuff (full disk :/) to collection opensource
also, I tried to commit into archivebot, and yes, one needs the blessing
SketchCow: WTB 1x commit access to collection archivebot for user FalconK
[06:32]
***Sk2d is now known as Sk1d [06:37]
yipdwFalconK: ErkDog's thing looks like a Warrior project, which doesn't use the ArchiveBot uploader [06:43]
***metalcamp has joined #archiveteam-bs [06:52]
JesseWI don't think there's anything wrong with putting the stuff inthe opensource collection.
It can be moved later.
[06:57]
xmcFalconK: right. you should add a tag or whatever it's called 'archivebot' [06:57]
JesseWI think tags are called "subject" [06:59]
xmcmaybe
or keywords
[07:00]
***vitzli has joined #archiveteam-bs [07:07]
mismatch_ has quit IRC (Remote host closed the connection)
mismatch_ has joined #archiveteam-bs
[07:16]
JesseW has quit IRC (Quit: Leaving.) [07:26]
............ (idle for 58mn)
schbirid has joined #archiveteam-bs [08:24]
bwn has quit IRC (Read error: Operation timed out) [08:36]
.... (idle for 16mn)
Boppen has joined #archiveteam-bs [08:52]
bwn has joined #archiveteam-bs [09:02]
godane has quit IRC (Read error: Operation timed out) [09:10]
..... (idle for 23mn)
godane has joined #archiveteam-bs [09:33]
........ (idle for 39mn)
metalcamp has quit IRC (Ping timeout: 258 seconds) [10:12]
.............. (idle for 1h8mn)
jspiros has quit IRC (leaving) [11:20]
jspiros has joined #archiveteam-bs [11:26]
..... (idle for 21mn)
metalcamp has joined #archiveteam-bs
godane has quit IRC (Quit: Leaving.)
[11:47]
...... (idle for 26mn)
metalcamp has quit IRC (Ping timeout: 258 seconds)
godane has joined #archiveteam-bs
RichardG has quit IRC (Read error: Operation timed out)
[12:15]
...... (idle for 26mn)
Smileymidas: for some reason it didn't show up lols [12:47]
***Smiley has quit IRC (Remote host closed the connection) [12:55]
Smiley has joined #archiveteam-bs [13:02]
......... (idle for 40mn)
RichardG has joined #archiveteam-bs [13:42]
........ (idle for 36mn)
wacky has joined #archiveteam-bs [14:18]
........ (idle for 36mn)
pgoetz has quit IRC (Remote host closed the connection) [14:54]
Start has quit IRC (Quit: Disconnected.)
godane has quit IRC (Read error: Operation timed out)
w0rp has quit IRC (Read error: Operation timed out)
closure has quit IRC (Read error: Operation timed out)
godane has joined #archiveteam-bs
beardicus has quit IRC (Read error: Operation timed out)
closure has joined #archiveteam-bs
midas sets mode: +o closure
beardicus has joined #archiveteam-bs
w0rp has joined #archiveteam-bs
[15:01]
pgoetz has joined #archiveteam-bs [15:20]
SketchCowI got a new Ultra-High-Def monitor, so you're all doomed.
I see EVERYTHING
[15:29]
midashas ultra-high-def monitor, still runs mame at 640x480 [15:30]
***Start has joined #archiveteam-bs [15:44]
....... (idle for 31mn)
pgoetz has quit IRC (Remote host closed the connection)
ersi has quit IRC (Read error: Operation timed out)
ersi has joined #archiveteam-bs
midas sets mode: +o ersi
swebb sets mode: +o ersi
[16:15]
...... (idle for 28mn)
JesseW has joined #archiveteam-bs
pgoetz has joined #archiveteam-bs
[16:48]
.... (idle for 18mn)
Start has quit IRC (Quit: Disconnected.)
JesseW has quit IRC (Quit: Leaving.)
vitzli has quit IRC (Leaving)
[17:07]
metalcamp has joined #archiveteam-bs [17:16]
....... (idle for 30mn)
SimpBrainwow scaleway not mucking about with cloud server prices [17:46]
...... (idle for 27mn)
ErkDogholy crap yeah [18:13]
HCrossYea, but their network speed isnt good [18:13]
ErkDogcause of the 300Mbit? [18:13]
HCrossnope, because they oversell [18:14]
ErkDogahhh so saturated [18:15]
Froggingoverselling can hit I/O and CPU performance too [18:15]
***schbirid has quit IRC (Quit: Leaving) [18:15]
ErkDogLOL plus they publically advertise hey run torrents
https://www.scaleway.com/imagehub/torrents/
[18:15]
SimpBraingood for private sites [18:16]
ErkDogyeah unmetered servers, run torrents, that will make a good experience for all [18:16]
Froggingcloud to butt is fun
http://archiveteam.org/index.php?title=User_talk:Jscott
"This is partly "fuck my butt" and partly "archive team" related"
[18:16]
SimpBrainsaturated pipes everywhere [18:17]
***tomwsmf-a has joined #archiveteam-bs [18:25]
joepie91SimpBrain: how dare you use the word 'cloud'
;)
[18:28]
SimpBrainwell it's not a physical dedicated server :P [18:28]
Froggingbutt server [18:29]
SimpBraintbh going into the future, it should be like cloud dedis especially for tiny companies and individuals, why do you physically need something physical to say it's yours, [18:30]
FroggingThat's pretty much the way it is already. But dedicated physical servers have advantages, such as not sharing system resources with other users, and you can usually get a whole hard disk to yuorself
yourself*
I have a VPS and a dedicated server, because sometimes I need more than 24GB of disk space and I don't want to pay $100/month for a higher VPS tier
[18:33]
SimpBrainyeah hdd space is what is killing vps for small time use [18:35]
yipdwdedicated physical server is so nice because I can be super-lazy in my Xen allocations and not give a shit
"how much for gitlab? fuck it, 8 gigs"
[18:36]
FroggingVPS definitely has its place though. They're very flexible and scalable
as with most things, it's not black-and-white "doing it this way is unquestionably better at everything"
[18:37]
yipdwthese days I read "flexible" as "fuck you, do it yourself" and "scalable" as "fuck you, pay us more for more nodes"
if you're on EC2 both are literally that
[18:38]
Froggingyipdw: Hah. Yeah it's a bit buzzwordy [18:39]
xmchahaha yes [18:39]
joepie91[19:28] <SimpBrain> well it's not a physical dedicated server :P [18:39]
FroggingBut I more meant that you can start an instance and do some stuff and then get rid of it without paying a setup fee up front [18:39]
joepie91scaleway? it absolutely is
the ARM pxes anyway
er
boxes
yep
[18:39]
FroggingIf I want to test some shit on a clean system with a clean connection, I just click "new Linode" [18:40]
yipdwI am also very annoyed at tracking down this one memory leak that is causing a load balancer to trigger scaling notifications which is causing an autoscaling group to go haywire [18:40]
joepie91still ARM boxes [18:40]
yipdwso I am probably biased [18:40]
joepie91SimpBrain: anyhow, "cloud" doesn't mean anything anyway
it's either a physical server, or a VM, and it might have hourly billing, or have an API for spinning them up
[18:40]
SimpBrainyeah [18:40]
joepie91or have high availability
or geographic redundancy
or a SAN
and any of these things might be indicated with 'cloud'
[18:40]
Froggingjoepie91: nah man [18:40]
joepie91in any combination [18:40]
***schbirid has joined #archiveteam-bs [18:40]
joepie91:p [18:40]
Froggingit's literally in the clouds
there's nothing physical about it
[18:40]
joepie91it's a meaningless buzzword basically [18:41]
Froggingto be fair, it has some degree of meaning. Unlike "internet of things" [18:41]
joepie91no, it really doesn't [18:41]
yipdwI store my files in a bong
personal cloud
[18:41]
Froggingi store my files in my butt [18:42]
yipdwanyway I don't know where this conversation started, what is it about [18:42]
Froggingdon't remember :p
Frogging scrolls up
SimpBrain said something about Scaleway
[18:42]
ErkDogFOS Makes me so sad :( [18:43]
Frogging[12:46:43] <@SimpBrain> wow scaleway not mucking about with cloud server prices [18:43]
ErkDog80Kbps :( [18:43]
yipdwfos has served us all well for years [18:43]
SimpBrainSimpBrain hides [18:43]
ErkDogI've got 45G of data waiting to by dumped.... :-/
like Wiki and GameTracker would be done if we could dump it somewhere, lol
or at least "caught up"
[18:43]
SimpBraingametrailers really hit fos hard
didnt help we was archiving 4 sites at the time i think
[18:47]
ErkDoglol gametrailers is a massive amount of data [18:47]
yipdwfos is not getting slammed like it was, maybe there's been some controls put on it
anyway the fos-to-ErkDog connection doesn't seem like the best either https://gist.github.com/yipdw/07994326c74c7ffa16e6
[18:47]
ErkDogwell I get 80K/sec here and about 125 from the server I am using
skyrim.towfowi.net
[18:48]
SketchCowI'm going to revisit FOS and its connection when I get there. [18:48]
ErkDogohhhh, yeah that's the trace you did [18:48]
yipdwit could be either end, I think blaming it on fos is premature [18:48]
ErkDogi'm on ha.wa.ecansol.net [18:48]
SketchCowNo, FOS is definitely doing something.
Something bad.
[18:49]
ErkDogpoor FOSy :(
or BAD FOSy whatever the case may be ;-D
[18:49]
SketchCowPart of it, of course, are the people going "Oh, it's not working fast, LET ME PUT 20 SIMULTANEOUS CONNECTIONS ON IT THAT WILL FIX IT" [18:49]
phuzionYeah, I'm hovering between 115 and 130KB/s going to FOS. [18:49]
SketchCowLuckily I can't tell who does this, which is why they are still alive [18:49]
ErkDogyeah cause they don't understand the idea of IO thrashing
SketchCow you should be able to
an incoming rsynch shows as a process doesn't ?
[18:49]
SketchCowNo, if I do it, I'll just start murdering you fucks
All of you
[18:50]
ErkDogLOL
well you can only complain so much, if you want to download all the internet, you have to give us a place to put it bro
[18:50]
xmceep [18:50]
SketchCowIt'll be me and a room of corpses and me with a machete going "good meeting, good meeting" and chewing a sour patch kid [18:50]
phuzionhahaha [18:50]
ErkDognetstat -alnp|grep #### where ### is the port of your incoming rsynch connections will tell you too
at least it would tell you the # of connections from that IP, but not who owns it
but you could firewall off people who have 1,000 processes running and when they ask why they can't upload stuff, we can explain to them, that they need ONE process per Project, per server, at -most-
[18:51]
yipdwI considered doing that and it is much harder to maintain than just finding the people and asking them to back off a bit [18:52]
ErkDogcouldn't you just tell the rsync/ssh protocol to only allow 2 connections per IP? [18:53]
yipdwyes but it's not a relevant defense
not when you have some people who have access to large subnets and are running warriors on all of them
[18:53]
ErkDogtrue
but when I look at the trackers
[18:54]
yipdwanyway, Atluxity is running a lot of traffic to fotolog [18:54]
ErkDogI only see like 5 or so people active on any given project [18:54]
yipdwyeah it's one person with a large number of nodes [18:54]
Froggingso it's people running a bunch of warriors on one machine that's hammering FOS? [18:55]
yipdwmany warriors on many machines [18:55]
FroggingIs more warriors not better?
Or are they doing it wrong
[18:55]
yipdwmore warriors is fine but there are limits to how fast we can take stuff in
this is just a limit
find why and work around it, etc
[18:55]
SketchCowI'm going to reboot the box. [18:56]
yipdwI also hate the word "scalable" because it gets people excited for no fucking reasn [18:56]
SketchCowI do see that the upload speed just skyrocketed. [18:56]
FroggingPerhaps the system could be adjusted so that FOS coodinates who is uploading what and when [18:56]
yipdwSketchCow: you might want to hold off, it looks like DFJustin's doing a compile [18:57]
ErkDogwell likely the bottleneck is disk I/O [18:57]
SketchCowHe is ALWAYS doing a compile [18:57]
yipdwoh ok never mind [18:57]
SketchCowSTOP BEING MY MECHANICS FOR A MOMENT
I have two torrents going on the box, I'm trying to shut them down and avoid living a pile of buff
[18:57]
FroggingMaybe instead of warriors uploading things ASAP they could upload when FOS asks them to, to limit load [18:58]
ErkDogbecause as you add additional incoming rsynchs, the spead of all the existing transfers is diminished significantly, so 10 RSynchs take more than 10 times as long to complete as a single RSynch [18:58]
yipdwso [18:58]
SketchCowNo, no. The problem is just a matter of the fact that the machine got extended at one point and it never, ever goes back.
And then people "do things"
I wish I knew the command in rtorrent to say "and delete the data"
[18:59]
Frogginghttps://www.youtube.com/watch?v=EHybN9UbhWM [18:59]
ersiersi scales yipdw [19:01]
ErkDogIf you want to delete data on remove I would suggest adding the below to your rtorrent.rc. It will be both faster and more robust than rutorrent's delete function (which relies on php and a forked process) and has the benefit of not crashing rtorrent since it remembers state instead.
method.set_key = event.download.erased, remove_file,"execute={rm,-drf,--,$d.get_base_path=}"
[19:03]
PurpleSymrsyncd is able to execute a script before starting a transfer. One could check the current load and stop the transfer if it is too high. [19:03]
SketchCowHa ha ha
HEY GUESS WHAT GUYS
I just found out there's a scheduled reboot of FOS anyway at 7pm EST
[19:05]
***bwn has quit IRC (Read error: Operation timed out) [19:08]
xmcbhahaha [19:09]
ErkDoglol [19:12]
***Start has joined #archiveteam-bs [19:22]
bwn has joined #archiveteam-bs [19:27]
..... (idle for 21mn)
SketchCowOK, so I'm going to see about shutting down my torrenting, cleaning up a few things, and then we get the reboot [19:48]
***SN4T14 has quit IRC (Remote host closed the connection)
SN4T14 has joined #archiveteam-bs
[19:59]
..... (idle for 21mn)
wackyDon't suppose anyone from the IA could gimme 5 min of time to hit a few questions off of them [20:23]
ErkDogsweet SketchCow thanks :-D [20:24]
JW_workwacky: toss your questions here — the worst that will happen is none of us will know or be willing to answer. [20:26]
wackyI work for a commercial archiving solution, we have a client (end user who owns the originally archived content) who is looking to get some content, them as the original content owner is it possible to get a warc/warc export?
They would have no problem paying for such a service
[20:29]
MrRadarThat's a question that would need to be addressed to the IA directly.
If it's content from the IA's Wayback Machine
[20:29]
JW_workI'd suggest sending that question to info@archive.org, providing (in the initial email) the specific URLs you are interested in, and whatever proof you have that you represent the original content owner. I have no idea whether that would be feasible, but it seems reasonable to me. [20:30]
MrRadarIf it's something that we (the Archiveteam) archived then the WARCs should already be available for download from the IA [20:32]
JW_workgood point. You can look up archivebot stuff with the viewer; for other stuff … probably search the wiki to see if it was a project. [20:34]
MrRadarFor reference, the ArchiveBot viewer is here: http://archive.fart.website/archivebot/viewer/ [20:35]
wackyAwesome - thanks all! Ill give the suggestions a shot [20:37]
JW_workcool, glad we were able to give you some pointers [20:38]
ErkDogsigh fart.website
lol a lot of the things archivebot is working on don't seem like "small" websites
one is @ 54 gigs, lol
[20:39]
MrRadarScroll down to the bottom of the dashboard to see some *really* big jobs [20:41]
ErkDogyeah lol one is 100 gigs, that one is 999 gigs? [20:41]
***Start has quit IRC (Quit: Disconnected.) [20:44]
SketchCow54gb is small [20:46]
phuzion54gb is tiny. I have a flash drive with more than 54gb of usable capacity.
Actually, I have like 3 or 4 laying around.
[20:49]
ErkDogLOL well I guess it depends on how you look at it
since -most- websites are like super tiny compared to that
we run a hosting company
our customer's largest site is 1.5 gigs, and it's eCommerce
[20:50]
MrRadarKeep in mind that the ArchiveBot saves web requests not necessarily what would be stored on the server
If you had a PHP script that printed an endless stream of random numbers that would be small on disk but the response would be huge
For full-site grabs we also tend to target sites that have lots of interesting stuff to save
[20:51]
ErkDoghmmm true [20:55]
***schbirid has quit IRC (Quit: Leaving) [21:04]
....... (idle for 31mn)
FalconKSketchCow: so do you want me to upload things to opensource with a special tag? or somewhere else? [21:35]
meh whatever I'll just upload them with subject: archivebot for now and we can always make more changes if desirable. [21:41]
***VADemon has joined #archiveteam-bs [21:43]
xmcyep
as long as they're separable from everything else
[21:45]
arkiverSketchCow: any taks this year in the Netherlands?
talks*
[21:52]
SketchCowNone planned, but then again this is the year I planned for not doing much speaking/travel except the Japan trip [21:53]
godanei figure a telethon at the end of the year at IA [21:54]
***fie has joined #archiveteam-bs [21:57]
FalconKok, much, much better
getting 5 mbit up into IA
the uploads are collection: opensource, subject: archivebot
content-type: web
who moves them?
[21:57]
yipdwif you can hold off the uploads until we can get that sorted out, that'd be nice
I don't think the viewer will find those
(until they get in the right place)
[22:00]
***metalcamp has quit IRC (Ping timeout: 258 seconds)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[22:07]
FalconKargh, since I already started, I can't. [22:15]
xmcno worries
items can always be moved
it's easy
[22:16]
FalconKit looks like mostly a metadata change yes [22:17]
xmcyup
metamgr can do it i think?
[22:17]
FalconKon the bright side, my pipeline is emptying out now and actually crawling things again
so
who are the individuals that are needed to sort it out?
[22:17]
xmcwhat's your IA account email address? [22:18]
FalconKfalcon@falconk.rocks [22:18]
xmcnerd
https://archive.org/details/archiveteam_archivebot_go_falconk_test_20160307www_youtube_com_20160306 this thingy
your item name is kind of fucky
[22:18]
FalconKyes, that was the test item [22:19]
xmcah [22:19]
FalconKthere is another, which was just uploaded, but isn't showing up under my uploads page
though I recall it taking a moment
[22:20]
yipdwoh, right, that's the main problem with distributed upload
s
naming
[22:21]
xmccan you get to metamgr with your account http://archive.org/metamgr.php?&w_uploader=falcon@falconk.rocks [22:21]
dxrtJust my 2c on this whole thing -I don't really want all the random crap my pipeline has grabbed to show up under my user account and linked to me - especially if something questionable is discovered later, it kind of seems like it'll be my liability and 'my upload' rather than the current system. [22:22]
FalconKxmc: not authorized [22:23]
xmcok [22:23]
yipdwdxrt: as far as I can tell, the rsync mode still exists [22:23]
FalconKyes
this change is very optional
[22:23]
yipdwI am however wondering how to name these items
the time-sequence thing doesn't work anymore
and UUID is not a solution
[22:23]
dxrtRight! I thought it was a current re-work of the current uploader, but I'm happy to hear that! [22:23]
xmcarchivebot_username_date ? [22:23]
yipdwmaybe, assuming username keeps all their clocks in sync [22:24]
FalconKso the way I am naming then now is like archiveteam_archivebot_go_falconk_content_radiosega_net_20160307 [22:24]
xmcwell is it a problem to put them in somewhat incorrect items
because timestamps exist in the datas
[22:24]
FalconKfor a crawl of content.radiosega.net which the crawler named with 20160307 as the date in the filename [22:24]
arkiverSketchCow: ok, the little archiveteam meeting last year was nice. We got some new project out of it too [22:24]
xmci thought you were doing one item per day per pipeliner [22:24]
FalconKwell I thought of doing that and then I wondered why I was associating items which had no logical association except that they were gathered proximally
I mean the item name is pretty arbitrary right?
[22:25]
xmcyes
it comes down to semantics really
i guess there's nothing wrong with item per archivebot job
[22:26]
yipdwyeah I guess in the end I'm ok with that [22:26]
xmcbut we do a bunch of single-page grabs too
xmc shrug
[22:26]
yipdwprovided the viewer can find them
yipdw checks
[22:27]
FalconKthere would be something wrong with one item per 5gb chunk
I doubt the viewer will find them until they are moved into a blessed collection
[22:27]
xmcan item per job, containing one or many warcs [22:27]
FalconKthey're in opensource with type web currently [22:27]
xmcsounds good to me [22:27]
yipdwoh, I meant that the viewer doesn't add additional criteria on top of collection
like /[0-9]+/
[22:27]
FalconKoh [22:27]
yipdwI think the answer is no? but I haven't checked [22:27]
FalconKI hope it doesn't! [22:27]
yipdwok the answer is probably "it's fine" [22:28]
FalconKit would be good to know but I have no way to verify
cool
I can help with whatever bulk crap needs doing as a result of this
[22:28]
yipdwat least https://github.com/ArchiveTeam/ArchiveBot/blob/master/viewer/archivebotviewer/database.py#L417, to me, indicates that we're clear [22:28]
FalconKarchivebot identifiers already have _, and I am doing some string translation [22:29]
***ndiddy has quit IRC (Read error: Connection reset by peer) [22:29]
yipdwit just occurred to me because we do have some tools that do that check [22:29]
FalconKthe translation is re.sub(r'[^0-9a-zA-Z-]+', '_', basename)
so DNS characters or _
[22:30]
***ndiddy has joined #archiveteam-bs [22:30]
yipdwyeah those'll be fine
AFAICT
[22:30]
FalconK:)
if not, we'll see.
and... wow, I have made a thing that uploads over 1GB per hour of internet into the archive.
[22:30]
yipdwnice [22:32]
FalconKFalconK enjoys this [22:32]
yipdwthe only other place I've seen that is on another rsync target we have
it's Kenshin's
[22:32]
FalconKwell actually 1GB per 15min [22:33]
yipdwas far as I can tell Kenshin basically owns Singapore [22:33]
FalconKthis just has 1gbps upstream
nothing special about it besides that
the transfer rate is really, really fluttery though
they end up looking like this:
https://archive.org/details/archiveteam_archivebot_go_falconk_content_radiosega_net_20160307
the non-viewability seems to be common to WARCs in opensource
[22:33]
yipdwyeah
the WARC also doesn't have extension .warc.gz for some reason
[22:36]
FalconKit is _warc_gz
hmm.
is that my doing?
[22:37]
yipdwit could be; IIRC wpull does .warc.gz [22:38]
FalconKyes, it is my doing [22:38]
yipdwI don't think it matters for derives (though maybe it does), but it can matter for browser downloads
and etc
[22:38]
FalconKlet me fix that. [22:38]
arkiverit matters for derives [22:38]
yipdwoh [22:38]
FalconKfixed. targets are now like /archiveteam_archivebot_go_falconk_content_radiosega_net_20160307/content.radiosega.net-inf-20160307-051602-1qvpq-00001.warc.gz [22:42]
yipdwcool [22:42]
FalconKnow is there some way to rename the one extant misnamed file
... probably not.
not by me anyway.
[22:42]
....... (idle for 32mn)
VADemonDoes anyone know, is 1GB softlimit per WARC file still recommended for mirrors or should it be raised? [23:16]
ErkDogftp ftp RSynch target is fast
the*
I can dump @ 15M/sec from 2 different systems all day
[23:21]
FalconK** rsync [23:28]
***tomwsmf-a has quit IRC (Read error: Operation timed out) [23:29]
ErkDogso you made it so you can upload directly into IA FalconK instead of having to rsynch it somewhere? [23:30]
***xXx_ndidd has joined #archiveteam-bs
fie_ has joined #archiveteam-bs
hawc145 has joined #archiveteam-bs
RichardG_ has joined #archiveteam-bs
phuz has joined #archiveteam-bs
Start has joined #archiveteam-bs
is-_ has joined #archiveteam-bs
ndiddy has quit IRC (hub.efnet.us irc.servercentral.net)
dashcloud has quit IRC (hub.efnet.us irc.servercentral.net)
fie has quit IRC (hub.efnet.us irc.servercentral.net)
RichardG has quit IRC (hub.efnet.us irc.servercentral.net)
ohhdemgir has quit IRC (hub.efnet.us irc.servercentral.net)
yipdw has quit IRC (hub.efnet.us irc.servercentral.net)
signius has quit IRC (hub.efnet.us irc.servercentral.net)
HCross has quit IRC (hub.efnet.us irc.servercentral.net)
ErkDog has quit IRC (hub.efnet.us irc.servercentral.net)
chfoo has quit IRC (hub.efnet.us irc.servercentral.net)
toad1 has quit IRC (hub.efnet.us irc.servercentral.net)
JW_work has quit IRC (hub.efnet.us irc.servercentral.net)
phuzion has quit IRC (hub.efnet.us irc.servercentral.net)
is- has quit IRC (hub.efnet.us irc.servercentral.net)
MrRadar has quit IRC (hub.efnet.us irc.servercentral.net)
chazchaz has quit IRC (hub.efnet.us irc.servercentral.net)
Laverne has quit IRC (hub.efnet.us irc.servercentral.net)
SimpBrain has quit IRC (hub.efnet.us irc.servercentral.net)
zino_ has quit IRC (hub.efnet.us irc.servercentral.net)
Infreq has quit IRC (hub.efnet.us irc.servercentral.net)
Darkstar has quit IRC (hub.efnet.us irc.servercentral.net)
slyphic has quit IRC (hub.efnet.us irc.servercentral.net)
Frogging has quit IRC (hub.efnet.us irc.servercentral.net)
dcmorton has quit IRC (hub.efnet.us irc.servercentral.net)
Cameron_D has quit IRC (hub.efnet.us irc.servercentral.net)
dxrt has quit IRC (hub.efnet.us irc.servercentral.net)
atlogbot has quit IRC (hub.efnet.us irc.servercentral.net)
swebb has quit IRC (hub.efnet.us irc.servercentral.net)
Famicoma1 has quit IRC (Ping timeout: 270 seconds)
chazchaz_ has joined #archiveteam-bs
yipdw_ has joined #archiveteam-bs
[23:37]
FalconKFalconK looks at the sadness that is efnet [23:44]
***dxrt_ has joined #archiveteam-bs
Infreq_ has joined #archiveteam-bs
ErkDog_ has joined #archiveteam-bs
swebb_ has joined #archiveteam-bs
Frogging_ has joined #archiveteam-bs
chfoo0 has joined #archiveteam-bs
zino__ has joined #archiveteam-bs
SimpBrai1 has joined #archiveteam-bs
[23:44]
FalconKErkDog_: yes, I did.
(so we can forget that the correct spelling of the project name is rsync, or that it even exists, perhaps...) ;)
[23:48]
***pi has joined #archiveteam-bs
pi is now known as MrRadar_
[23:51]
ErkDog_lol
soz :-D
[23:56]
***ErkDog_ is now known as ErkDog
dashcloud has joined #archiveteam-bs
swebb_ is now known as swebb
Frogging_ is now known as Frogging
JW_work has joined #archiveteam-bs
MrRadar_ is now known as MrRadar
toad1 has joined #archiveteam-bs
slyphic has joined #archiveteam-bs
[23:57]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)