Time |
Nickname |
Message |
00:12
🔗
|
godane |
i'm recapturing a tape cause of sync issues |
00:22
🔗
|
godane |
ok now i think the recording is out of sync |
00:22
🔗
|
godane |
the other recording on this tape has no sync issues |
00:26
🔗
|
godane |
so the recording was christine movie on cbs in 1987-04-18 |
00:27
🔗
|
|
SimpBrain has quit IRC (Read error: Operation timed out) |
00:29
🔗
|
|
SimpBrain has joined #archiveteam-bs |
00:33
🔗
|
|
godane has quit IRC (Ping timeout: 246 seconds) |
00:35
🔗
|
JAA |
BartoCH: ^ That should be everything I threw into ArchiveBot. Looks like I didn't do anything for the March 2018 vote (Billag & finances). Once the bot runs, we'll get pretty tables for everything. :-) |
00:36
🔗
|
|
tammy_ has joined #archiveteam-bs |
00:36
🔗
|
tammy_ |
I'm the one who has the InterfaceLIFT warc scrape from a year or 2 ago. Is there a tool to upload this to IA that I can ratelimit the upload? I'd prefer to do it that way rather than go through the web interface. |
00:36
🔗
|
|
evul_ has joined #archiveteam-bs |
00:38
🔗
|
JAA |
Sorry, didn't see your message on Reddit since I was busy adding stuff to our wiki. |
00:38
🔗
|
tammy_ |
no worries |
00:38
🔗
|
JAA |
Looks like "ia" doesn't have a rate limiting option, but I think you can also upload with curl, and that should have an option somewhere. |
00:38
🔗
|
tammy_ |
long time no see jaa |
00:38
🔗
|
JAA |
Indeed :-) |
00:38
🔗
|
JAA |
https://archive.org/help/abouts3.txt has details on how to upload with curl. |
00:39
🔗
|
tammy_ |
ok, I'll look into that after dinner. playing some rocket league. dataset is nice and safe though :) |
00:39
🔗
|
JAA |
With that large an item, make sure to provide a size hint (described somewhere in that document). |
00:40
🔗
|
JAA |
But if you can, I'd suggest you just use the "ia" tool instead since it's the canonical way of uploading large amounts of data to IA. |
00:40
🔗
|
|
godane has joined #archiveteam-bs |
00:41
🔗
|
godane |
i'm taking a break digitizing dashcloud tapes |
00:42
🔗
|
godane |
i was going to use vlc to sync the eariler rip but then vlc crash the system |
00:43
🔗
|
godane |
like mouse moved but nothing responsed to it |
00:48
🔗
|
JAA |
tammy_: I linked the tool in my PM, by the way, but here's the link again: https://archive.org/services/docs/api/internetarchive/ (Python package "internetarchive") |
01:05
🔗
|
|
marked has quit IRC (Read error: Operation timed out) |
01:06
🔗
|
|
marked has joined #archiveteam-bs |
01:06
🔗
|
|
marked has quit IRC (west.us.hub irc.Prison.NET) |
01:06
🔗
|
|
godane has quit IRC (west.us.hub irc.Prison.NET) |
01:06
🔗
|
|
achip has quit IRC (west.us.hub irc.Prison.NET) |
01:10
🔗
|
|
Exairnous has joined #archiveteam-bs |
01:16
🔗
|
|
achip has joined #archiveteam-bs |
01:16
🔗
|
|
marked has joined #archiveteam-bs |
01:16
🔗
|
|
godane has joined #archiveteam-bs |
01:23
🔗
|
|
BlueMax has joined #archiveteam-bs |
01:32
🔗
|
|
Dimtree has joined #archiveteam-bs |
01:40
🔗
|
|
ndiddy has joined #archiveteam-bs |
01:41
🔗
|
Exairnous |
JAA: I see what you mean about youtube on IA. Is there anyway to put a link on the youtube video page to the actual video |
01:41
🔗
|
Exairnous |
or do you know of another place to save youtube that handles playback better? |
02:17
🔗
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) |
02:56
🔗
|
JAA |
Exairnous: No, fortunately there is no way to add links like that to the Wayback Machine. It could completely compromise the authenticity of the archived snapshot, since it would be a fake response. And no, I'm not aware of any good solution to archiving YouTube pages. Playback will always be tricky with sites like that; even if you use a full browser for the archival etc., playback might happen on a |
02:56
🔗
|
JAA |
different browser or platform, which can change which URLs (or in this case, which video resolution, for example) is requested, thus breaking the playback. Archiving these things is a huge nightmare really. The best solution is to extract the relevant information from it and present it in a more sane way. |
03:02
🔗
|
Exairnous |
JAA: What about links/embeds to the archived youtube videos on an archived website? Will they still work? |
03:05
🔗
|
JAA |
Exairnous: No, probably not. The WBM has some stuff for handling YT videos specially though and replacing them with their own player, but I'm not sure if we can feed into that in any way. |
03:06
🔗
|
Exairnous |
:( |
03:07
🔗
|
Exairnous |
JAA: There are several youtube links in pages on ngharmony.ca. Is there any way to have them resolve correctly after the youtube channel is taken down? |
03:08
🔗
|
JAA |
Exairnous: Define "resolve correctly". They'll still point to the same YouTube pages, which will be broken in the WBM. |
03:10
🔗
|
Flashfire |
Um I am having trouble with the save now feature |
03:10
🔗
|
Flashfire |
it 404s when I try and save a page |
03:10
🔗
|
Exairnous |
JAA: Point to a working video. |
03:10
🔗
|
JAA |
Exairnous: Almost certainly no. |
03:11
🔗
|
Exairnous |
:( |
03:11
🔗
|
Flashfire |
https://web.archive.org/save/https://www.youtube.com/watch?v=El41sHXck-E gave me a 404 damn it |
03:11
🔗
|
JAA |
YouTube's fault really. If they simply used an HTML5 <video> tag, it would all work fine. |
03:11
🔗
|
Flashfire |
Dont ask why I am trying to save youtube videos like that but why its not working and 404ing on me is annoying |
03:11
🔗
|
|
godane has quit IRC (Ping timeout: 255 seconds) |
03:12
🔗
|
Flashfire |
Can someone else check if they are having problems with the save now feature please or if its just me? |
03:12
🔗
|
JAA |
Flashfire: Works fine for me. Well, except the saved page is broken, but that's expected. |
03:13
🔗
|
Flashfire |
I mean does using the https://web.archive.org/save/ work for any page |
03:13
🔗
|
Flashfire |
it just 404s when I click save page |
03:13
🔗
|
JAA |
Yes, I simply visited the link you pasted, and it archived the page. |
03:13
🔗
|
Exairnous |
JAA: I just had a look at a youtube video with a browser inspector. I'm pretty sure it had a video tag with a blob: link. |
03:14
🔗
|
JAA |
https://web.archive.org/web/20190308031204/https://www.youtube.com/watch?v=El41sHXck-E |
03:15
🔗
|
Flashfire |
Try visiting the save now page and saving a random link. It wont work for me |
03:15
🔗
|
Flashfire |
The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again. |
03:15
🔗
|
Flashfire |
I get that error |
03:15
🔗
|
JAA |
Exairnous: Yeah, they use a <video> tag but then modify its contents with JS. Specifically, the <source> tag has an attribute src$="[[videoThumbnail_.url]]" instead of simply src with an actual URL. |
03:15
🔗
|
JAA |
What I really mean is a site that just works without JS. |
03:16
🔗
|
Exairnous |
of course they do :/ |
03:16
🔗
|
|
SimpBrain has quit IRC (Read error: Operation timed out) |
03:17
🔗
|
JAA |
Yeah, modern websites need to use at least three frameworks on both client and server, otherwise they're not modern enough. |
03:18
🔗
|
JAA |
Well, welcome to the hell that is archiving JS-heavy websites. :-) |
03:19
🔗
|
Exairnous |
JS-heavy purposely obfuscated websites :P |
03:20
🔗
|
Exairnous |
Cause I'm fairly sure WM can playback at least some JS? |
03:20
🔗
|
JAA |
Oh, JS on its own works fine. It's the xmlHttpRequests and similar stuff which break. |
03:21
🔗
|
Exairnous |
that sounds like it needs a server to wrok properly |
03:21
🔗
|
Exairnous |
*work |
03:22
🔗
|
|
SimpBrain has joined #archiveteam-bs |
03:22
🔗
|
|
underscor has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
Hani has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
noirscape has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
argus has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
arbin has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
ReimuHaku has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
Ganonmast has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
PurpleSym has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
kisspunch has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
Frogging has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
jodizzle has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
VoynichCr has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
MrRadar2 has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
Tenebrae has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
|
BnAboyZ has quit IRC (hub.efnet.us irc.efnet.nl) |
03:22
🔗
|
JAA |
Yeah, but IA's URLs also come into play since the absolute URL is different. |
03:22
🔗
|
JAA |
WBM's URLs* |
03:22
🔗
|
|
slyphic has quit IRC (Read error: Operation timed out) |
03:22
🔗
|
|
slyphic has joined #archiveteam-bs |
03:23
🔗
|
|
Jopik has joined #archiveteam-bs |
03:23
🔗
|
JAA |
I think someone here (PurpleSym?) had a PoC of something based on service workers for rewriting URLs on the fly. |
03:23
🔗
|
JAA |
WBM currently works by rewriting anything that looks like a URL statically. |
03:23
🔗
|
JAA |
That solution would instead hijack any requests sent by the browser, rewrite them into the equivalent WBM URLs, and then send that request instead. |
03:24
🔗
|
JAA |
That still doesn't help with the potential differences in URLs based on browser versions etc. though. |
03:24
🔗
|
|
underscor has joined #archiveteam-bs |
03:24
🔗
|
|
Hani has joined #archiveteam-bs |
03:24
🔗
|
|
noirscape has joined #archiveteam-bs |
03:24
🔗
|
|
argus has joined #archiveteam-bs |
03:24
🔗
|
|
arbin has joined #archiveteam-bs |
03:24
🔗
|
|
ReimuHaku has joined #archiveteam-bs |
03:24
🔗
|
|
Ganonmast has joined #archiveteam-bs |
03:24
🔗
|
|
PurpleSym has joined #archiveteam-bs |
03:24
🔗
|
|
kisspunch has joined #archiveteam-bs |
03:24
🔗
|
|
Frogging has joined #archiveteam-bs |
03:24
🔗
|
|
jodizzle has joined #archiveteam-bs |
03:24
🔗
|
|
VoynichCr has joined #archiveteam-bs |
03:24
🔗
|
|
MrRadar2 has joined #archiveteam-bs |
03:24
🔗
|
|
Tenebrae has joined #archiveteam-bs |
03:24
🔗
|
|
BnAboyZ has joined #archiveteam-bs |
03:24
🔗
|
|
irc.efnet.nl sets mode: +oo PurpleSym MrRadar2 |
03:24
🔗
|
JAA |
And the archival would still require a full browser, which is very inefficient compared to our usual methods of archiving things. |
03:25
🔗
|
|
tammy_ has quit IRC (Ping timeout: 261 seconds) |
03:25
🔗
|
|
VerifiedJ has quit IRC (Ping timeout: 252 seconds) |
03:25
🔗
|
Exairnous |
JAA: Would something like Rhizome's Webrecorder produce a warc that could be uploaded to IA and playback correctly? Or does youtube have to be dynamic? |
03:26
🔗
|
JAA |
The problem partially lies in the Wayback Machine itself. |
03:26
🔗
|
JAA |
So no, playback would almost certainly not work correctly. |
03:27
🔗
|
|
flipflop has quit IRC (Read error: Operation timed out) |
03:32
🔗
|
Flashfire |
JAA try putting https://vaguthu.mv/evaguthu/163689 through the save now page found at https://web.archive.org/save/ |
03:32
🔗
|
Flashfire |
its not working |
03:34
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
03:44
🔗
|
Flashfire |
wtf is going on for me to not be able to use the save now feature |
03:45
🔗
|
Flashfire |
Have I been marked as a spammer from the weird urls? |
04:07
🔗
|
|
odemgi has joined #archiveteam-bs |
04:09
🔗
|
|
odemgi_ has quit IRC (Ping timeout: 252 seconds) |
04:13
🔗
|
hook54321 |
Flashfire: What message do you get when trying to save it? |
04:14
🔗
|
Flashfire |
The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again. |
04:15
🔗
|
hook54321 |
I've had that happen a few times, not sure why. Try another URL and see what happens. |
04:15
🔗
|
|
odemg has quit IRC (Ping timeout: 615 seconds) |
04:19
🔗
|
Flashfire |
it does it with other urls as well |
04:22
🔗
|
|
odemg has joined #archiveteam-bs |
04:30
🔗
|
|
VerifiedJ has quit IRC (Ping timeout: 252 seconds) |
04:34
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
04:46
🔗
|
|
qw3rty111 has joined #archiveteam-bs |
04:49
🔗
|
|
m007a83_ is now known as m007a83 |
04:50
🔗
|
|
qw3rty119 has quit IRC (Read error: Operation timed out) |
04:54
🔗
|
|
HashbangI has quit IRC (Read error: Operation timed out) |
04:54
🔗
|
|
nataraj_ has joined #archiveteam-bs |
04:55
🔗
|
|
a_spook_ has joined #archiveteam-bs |
04:57
🔗
|
|
HashbangI has joined #archiveteam-bs |
04:57
🔗
|
a_spook_ |
Flashfire: dunno if it helps, but I have weird issues with wayback sometimes and they go away when I clear cookies? Though, I don't think it was the same error message as yours. |
05:02
🔗
|
a_spook_ |
Flashfire: also I just did https://web.archive.org/save/https://www.youtube.com/watch?v=El41sHXck-E&disable_polymer=1 because ew modern youtube :P |
05:03
🔗
|
Flashfire |
Yeah see doing it that way works but using the save now button uses less of my computers resources. or at least makes the fan not scream as loud |
05:03
🔗
|
Flashfire |
but its not letting me do that |
05:08
🔗
|
|
ndiddy_ has joined #archiveteam-bs |
05:09
🔗
|
a_spook_ |
Flashfire: ah I see, I missed that page you said you were using, sorry |
05:12
🔗
|
Flashfire |
a_spook_ Yeah trying to use the save now page |
05:13
🔗
|
|
ndiddy has quit IRC (Ping timeout: 492 seconds) |
05:30
🔗
|
a_spook_ |
Flashfire: I've actually never used that before and just tried it. I'm getting the same error as you on a random page I chose to test. Guess it's not just you then! :') |
05:37
🔗
|
|
SimpBrain has quit IRC (Read error: Connection reset by peer) |
05:39
🔗
|
|
SimpBrain has joined #archiveteam-bs |
05:42
🔗
|
Exairnous |
Is it better to use an IA save now bookmarklet or archivebot for a single page? |
05:44
🔗
|
Exairnous |
Cause I looked at one of the youtube embeds in my site (seems to be in IA now, yay) and the iframe link resolves to a valid link not in the wayback machine |
05:45
🔗
|
Exairnous |
I think if I archive that link the embed may work, but I'm wondering whether to use archivebot or a bookmarklet. |
05:52
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
05:52
🔗
|
|
wp494 has joined #archiveteam-bs |
06:14
🔗
|
|
ndiddy_ has quit IRC () |
06:58
🔗
|
Exairnous |
JAA: ^^ |
07:25
🔗
|
|
SimpBrain has quit IRC (Remote host closed the connection) |
07:25
🔗
|
|
SimpBrain has joined #archiveteam-bs |
07:34
🔗
|
|
SimpBrain has quit IRC (Remote host closed the connection) |
07:34
🔗
|
|
SimpBrain has joined #archiveteam-bs |
07:44
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
07:44
🔗
|
|
SimpBrain has quit IRC (Read error: Connection reset by peer) |
07:47
🔗
|
|
Pixi` has joined #archiveteam-bs |
07:48
🔗
|
|
Pixi has quit IRC (Read error: Operation timed out) |
07:51
🔗
|
|
SimpBrain has joined #archiveteam-bs |
07:54
🔗
|
|
VerifiedJ has quit IRC (Ping timeout: 252 seconds) |
07:58
🔗
|
|
SimpBrain has quit IRC (Remote host closed the connection) |
08:05
🔗
|
|
SimpBrain has joined #archiveteam-bs |
08:06
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
08:30
🔗
|
|
S1mpbrain has joined #archiveteam-bs |
08:30
🔗
|
|
SimpBrain has quit IRC (Remote host closed the connection) |
08:47
🔗
|
|
lag__ has joined #archiveteam-bs |
08:55
🔗
|
|
S1mpbrain has quit IRC (Ping timeout: 615 seconds) |
10:13
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 740 seconds) |
10:14
🔗
|
|
Mateon1 has joined #archiveteam-bs |
10:38
🔗
|
JAA |
Exairnous: Not sure. Both methods have advantages and disadvantages. But I don't know which works better for YouTube. |
10:39
🔗
|
JAA |
VoynichCr: Oh, awesome! I was looking for a way to do that but couldn't figure it out. :-) |
10:40
🔗
|
JAA |
I guess there's no way to filter out the /list pages, right? |
10:44
🔗
|
|
Hani has quit IRC (Read error: Connection reset by peer) |
10:44
🔗
|
|
Hani has joined #archiveteam-bs |
10:51
🔗
|
|
Gfy has quit IRC (Ping timeout: 265 seconds) |
10:54
🔗
|
|
a_spook_ has quit IRC (Quit: Connection closed for inactivity) |
11:03
🔗
|
|
Gfy has joined #archiveteam-bs |
12:16
🔗
|
|
bitBaron has joined #archiveteam-bs |
13:42
🔗
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) |
13:52
🔗
|
|
godane has joined #archiveteam-bs |
14:15
🔗
|
|
bitBaron has joined #archiveteam-bs |
14:53
🔗
|
|
wp494 has quit IRC (Ping timeout: 492 seconds) |
14:55
🔗
|
|
wp494 has joined #archiveteam-bs |
15:43
🔗
|
|
deevious has quit IRC (Quit: deevious) |
16:02
🔗
|
|
Oddly has joined #archiveteam-bs |
16:03
🔗
|
|
schbirid has joined #archiveteam-bs |
16:26
🔗
|
|
VerifiedJ has quit IRC (Ping timeout: 252 seconds) |
17:27
🔗
|
|
Hani111 has joined #archiveteam-bs |
17:28
🔗
|
|
Hani has quit IRC (Read error: Connection reset by peer) |
17:28
🔗
|
|
Hani111 is now known as Hani |
17:38
🔗
|
|
Oddly has quit IRC (Ping timeout: 255 seconds) |
17:44
🔗
|
|
VerifiedJ has joined #archiveteam-bs |
17:48
🔗
|
JAA |
My wiki bot will now keep the WBM exclusion list sorted. |
18:02
🔗
|
|
nataraj_ has quit IRC (Read error: Operation timed out) |
18:16
🔗
|
|
Oddly has joined #archiveteam-bs |
18:55
🔗
|
VoynichCr |
JAA: i dont think that filtering /list pages is possible |
19:18
🔗
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) |
19:35
🔗
|
Exairnous |
JAA: Does saving with a bookmarklet interfere with what archivebot got? |
19:36
🔗
|
|
bitBaron has joined #archiveteam-bs |
19:47
🔗
|
|
evul_ is now known as evul |
19:58
🔗
|
|
BlueMax has joined #archiveteam-bs |
21:05
🔗
|
|
Albardin has quit IRC (Read error: Operation timed out) |
21:07
🔗
|
|
Oddly has quit IRC (Ping timeout: 255 seconds) |
21:08
🔗
|
|
kiskabak has quit IRC (Ping timeout: 265 seconds) |
21:13
🔗
|
|
Hani has quit IRC (Read error: Operation timed out) |
21:13
🔗
|
|
Hani has joined #archiveteam-bs |
21:20
🔗
|
|
Hani has quit IRC (Ping timeout: 268 seconds) |
21:20
🔗
|
|
Hani has joined #archiveteam-bs |
22:39
🔗
|
|
wyatt8740 has joined #archiveteam-bs |
23:00
🔗
|
JAA |
Exairnous: Well, the Wayback Machine is one big mixture of WARCs from all over the place, including the "save now" feature and ArchiveBot. Meaning, when you view the AB snapshot, you may also see content (e.g. images, stylesheets, scripts) from "save now" and vice-versa. So yes, it could interfere in that way. But the AB snapshot itself won't be affected by it in any way. |
23:01
🔗
|
jodizzle |
Seems like a lot of the Venezuelan sites are down right now. Wonder if it's because of this: https://www.theguardian.com/world/2019/mar/07/venezuela-hit-by-major-power-outage |
23:06
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
23:18
🔗
|
|
MR9K has quit IRC (Remote host closed the connection) |
23:19
🔗
|
|
MR9K has joined #archiveteam-bs |
23:39
🔗
|
Gfy |
SketchCow: is there a chance day addnfo-2010-1020.zip got skipped in the process somewhere? (regarding https://archive.org/download/nfo_large_collection_2009_2012) |
23:58
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
23:59
🔗
|
|
wp494 has joined #archiveteam-bs |